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Data assimilation refers to any approach designed to improve the estimation 
of system states or parameters by exploiting the additional information con- 
tained in the observations of a dynamical system. In practice, a mathematical 
model is often not the complete description of the underlying physical pro- 
cess for various reasons. In this case, data assimilation can be used to narrow 
the gap between the estimation from a mathematical model and reality. 

In data assimilation applications, one is often confronted by three prob- 
lems: nonlinearity, non-Gaussianity and high dimensionality. This disserta- 
tion is thus dedicated to studying some data assimilation methods that aim 
to address these problems. 

First of all, we consider two types of nonlinear Kalman filter, the ensemble 
Kalman filter (EnKF) and the sigma point Kalman filter (SPKF), for data 
assimilation in nonlinear Gaussian systems. To reduce the computational 
cost of the SPKF in high dimensional systems, we introduce the reduced 
rank SPKF. 

Then we proceed to study the Gaussian sum filter (GSF) for data assim- 
ilation in nonlinear non-Gaussian systems. A GSF essentially consists of a 



set of parallel nonlinear Kalman filters. For this reason, we call a nonlin- 
ear Kalman filter a "base filter" of the GSF. The aforementioned EnKF and 
reduced rank SPKF can both be used as base filters of a GSF. To reduce 
the computational cost of a GSF, we also propose an auxiliary algorithm. 
We show that, if the reduced rank SPKF-based GSF is equipped with the 
auxiliary algorithm and implemented in parallel, it can achieve almost the 
same computational speed as the reduced rank SPKF itself. With suitable 
parameters in the reduced rank SPKF-based GSF or the EnKF-based GSF, 
the GSF normally outperforms its base filter. 
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Chapter 1 
Introduction 



1.1 Motivation 

In this dissertation, data assimilation is referred to as any technique that 
incorporates information from observations into a dynamical system in order 
to improve the estimation of system states or parameters [63]. Throughout 
this dissertation, our focus will be on the state estimation problem. In prin- 
ciple, the parameter estimation problem can be recast as a state estimation 
problem by treating the parameters as some unobserved system states |85j . 

To see the demand for data assimilation in practice, we note that a prac- 
tical model is usually not a complete description of the underlying physical 
process in the real world. For example, the limitation of our knowledge in 
understanding nature, and the model resolution that a modern computer can 
afford are two of the factors that make a practical model deviate, to some 
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extent, from the underlying physical process. As a result, data assimilation 
is often employed to narrow the gap between the estimation from a practical 
model and reality by exploiting the information contained in observations of 
the underlying physical process. 

An an example, we use global numerical weather prediction (NWP) to 
illustrate the role of data assimilation in practice. 

In global NWP, the primitive equations that govern the evolution of the 
atmosphere are derived from the conservation laws of momentum, energy, 
gas and water masses, together with the equation of state for ideal gases 
|18| p. 32]. Theoretical solution of these governing equations is intractable. 
Therefore, one has to discretize the governing equations to obtain a numerical 
solution instead. With discretization, a global prediction model typically has 
millions of state variables, with a resolution of 50-100 kilometer HSl pp. 13, 

n 

127] Q Because of the limitation in model resolution, there may be some 
subgrid-scale physical processes that cannot be resolved |48, Ch. 4]. 

NWP is an initial-value problem, in the sense that we need the present 
states of the atmosphere, normally called "initial conditions", to predict its 
evolution in the future. For this reason, the determination of the initial 
conditions is one of the important practices in NWP. In Fig. 11.11 we show 
a typical 6-hour data assimilation cycle of present-day operational NWP 

^The most recent configuration of a global model may have a better res- 
olution. For example, the model used by the Met Office in UK has a 
mid-latitude resolution of approximately 40 km, with the number of state 
variables in the order of 10^. For details, see the Met Office website 
http:/ /www. metoffice.gov.uk/science/creating/daysahead/nwp/um_config. html. 
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systems (after Fig. 1.4.2 of pS]). For convenience of discussion, we suppose 
that an assimilation cycle starts at time t and ends at time t + 6 (in unit 
of hours). One evolves the estimation of the initial conditions at t forward, 
so as to obtain a first guess at t + 6. Then the observations between t — 3 
and t + 3 are incorporated to cahbrate the first guess through some data 
assimilation method, for example, three or four dimensional variational data 
assimilation [IHl Ch. 5] (Note that the calibrated initial conditions have to 
satisfy the governing equations, which is the reason to include "balancing" 
in Fig. II. ip . After calibration, one obtains an improved estimation of the 
initial conditions at t + 6. With this information, on one hand, one evolves 
the improved estimation at t + 6 forward to obtain a first guess of the initial 
conditions at t + 12, and then incorporates the observations between t + 3 
and t + 9 to calibrate the first guess, and so on. On the other hand, one 
also evolves the improved estimation at t + 6 forward without any calibration 
for longer times, for example, 24-72 hours, for the purpose of operational 
weather forecasts. 

Data assimilation can also be adopted in many other fields, for example, 
climate prediction [5J, oceanography [68], hydrology [68], petroleum engineer- 
ing [66], bioinformatics [621 [91], finance and econometrics (HEl EH], to name 
but a few. In general, data assimilation practices in those fields (including 
NWP) are often confronted by the following three problems: 

• Nonlinearity: the systems under assimilation are nonlinear; 
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• Non-Gaussianity: the probability distributions of the systems under 
assimilation are non-Gaussian; 

• High dimensionality: the dimensions of the systems under assimilation 
are very high. Therefore the computational cost is very expensive. 

The first two problems, nonlinearity and non-Gaussianity, often make the 
well-established methods for linear Gaussian systems fail to attain the op- 
timal estimations of the underlying physical processes. Much research has 
been conducted in the field of estimation theory to address these two prob- 
lems. For examples, see [U EHl EOl ES] and the references therein. The last 
problem of high dimensionality, affecting the computational speed of a data 
assimilation algorithm, is an important factor for real-time applications. This 
is frequently discussed in the data assimilation community from a practical 
point of view. For examples, see [211 1251 HH] and the references therein. 

The aim of this dissertation is to study and develop some sequential data 
assimilation methods, in an attempt to address the above three problems. In 
this regard, we have three major objectives to achieve, as will be stated in 
§ II. 3[ Before that, however, we would like to introduce some concepts that 
will be frequently used in this dissertation. 
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1.2 Some concepts in data assimilation 



1.2.1 Classification of data assimilation methods 

Depending on the relative positions between the most recent observations 
and the system states to be estimated, data assimilation methods can be 
classified as three categories: predictive algorithms, filtering algorithms, and 
smoothing algorithms [72l p. 10], as will be explained below. 

Let Xj be the system state to be estimated at time i, Yk = {yk, Jk-i, ■ ■ ■ ,} 
be the collection of historical observations available up to and including time 
k, with Yj being the observation made at instant j {j < k). To use the 
information contents of to improve the estimation of Xj, 

• the estimation method is a predictive algorithm ii k < i; 

• the estimation method is a filtering algorithm if = i; 

• the estimation method is a smoothing algorithm if k > i. 

In the data assimilation community, it is customary to classify data assim- 
ilation as either a sequential and or a retrospective (non-sequential) method 
[Hj. A sequential data assimilation method is an algorithm that utilizes 
the information contents of the observations up to and including the time 
when the system state is to be estimated. This is usually used for real time 
estimation problems. In contrast, a retrospective data assimilation method 
incorporates not only observations from the past, but also those in the future 
(relative to the system state to be estimated), which is often applied to the 
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exercise of re-analysis [H]. Thus by definition, a retrospective data assimi- 
lation method is a smoothing algorithm, while in general a sequential data 
assimilation method is a combination of predictive and filtering algorithms. 

1.2.2 Dynamical and observation systems 

A dynamical system is a mathematical description of a process that "consists 
of a set of possible states, together with a rule that determines the present 
state in terms of past states" P, p. 2]. An observation system is a description 
of how the observations of a dynamical system are made. In this dissertation, 
we will follow the notations suggested by He et al. [36j as far as possible. 
We normally use the symbol x to denote a state vector and the symbol y to 
denote an observation vector. 

For illustration, let us take the following system 



as an example. Eq. (11. lap represents a dynamical system, where denotes 
the state vector at time k, is the dynamical noise, and Aik,k+i is the 
transition operator. We define the state space as the set of all possible system 
states. Thus the transition operator Aik,k+i maps a state space onto the 
state space itself. On the other hand, Eq. fll.lb|) represents the corresponding 
observation system, where TCk is the observation operator at time k, yu is the 
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corresponding observation vector, and v^, is the observation noise. Similarly, 
we define the observation space as the set of all possible observations. Thus 
the observation operator Ti^ maps a state space onto an observation space. 
In this dissertation, we normally denote the dimension of the state space by 
m, and the dimension of the observation space by m°^'" for distinction. 

1.2.3 Truth, background and analysis 

In this dissertation, the "true state" of a physical process at a given time k 
is referred to as the realization of the process at that instanlo. Following the 
convention in the data assimilation community, we often call the true state 
the "truth". 

The background at a given time k represents the prior information of the 
true state at that instant [14j. It can be considered as an analogue to the 
concept of "prior distribution" in Bayesian statistics. Similarly, the analysis 
at a given time k is the output of a data assimilation algorithm at that 
instant. In sequential data assimilation, the analysis is normally updated 
from the background by incorporating the incoming observation. An analysis 
can be considered as an analogue to the concept of "posterior distribution" 
in Bayesian statistics. 

In this dissertation, normally we will denote the truth, the background 



^If there is any randomness in the underlying physical process, the "true state" at a 
given instant may not be unique. In this case, it appears more appropriate to call state 
estimation "state tracking", following the argument in [39]. However, we will follow the 

convention and still use the term "state estimation" throughout this dissertation. 



7 



and the analysis at instant k by x^^, x^, and x^, respectively. 
1.2.4 Error statistics 

It is customary to use some offset quantities from the "true states" to describe 
the uncertainties in data assimilation. Following the convention in the data 
assimilation community [H], we normally call these offsets "errors". Let 
x^^ be the truth at instant /c, then we can define the following two types of 
estimation errors [H]: 

• Background error e\ at instant k\ = x^ — x^^. 

• Analysis error el at instant k: = x^ — x^''. 
Correspondingly, the error covariances are defined as [14j : 

• Background error covariance at instant k: = IE((e^ — Ee^)(e^ — 

• Analysis error covariance P^ at instant k: P^ = IE((e^ — Ee^)(e^ — 

where E denotes the expectation, and the superscript T means transpose. 
For convenience, we may also call P^ and P^ background covariance and 
analysis covariance, respectively. 
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1.3 Objectives and approaches 



1.3.1 Objectives 

In this dissertation we focus on studying sequential data assimilation meth- 
ods, more specifically, the ensemble Kalman filter (EnKF) and other types of 
recursive nonlinear filters for data assimilation in high dimensional systems. 
In this regard, we have three major objectives. 

One objective is to understand various sequential data assimilation algo- 
rithms from the point of view of recursive Bayesian estimation (RBE). RBE 
is a general probabilistic approach that recursively estimates the probabil- 
ity density function (pdf) of an underlying physical process over time. It 
provides a uniform framework to interpret and derive sequential data as- 
similation algorithms in various situations, as will be shown in subsequent 
chapters. 

Another objective is to introduce a few filters, including the sigma point 
Kalman filters (SPKFs) and the sigma point Gaussian sum filters (SPGSFs). 
The SPKFs [721 [82] were developed to assimilate nonlinear /Gaussian sys- 
tems. Here by "nonlinear/Gaussian" we mean the scenario, where there 
exists nonlinearity in the dynamical and/or observation system(s), and the 
underlying system states, together with the dynamical and observation noise. 



are all assumed to follow some Gaussian distribution; 



. Like the extended 



•^Similar notations like "linear/ Gaussian" and "nonlinear/non-Gaussian" will be fre- 
quently adopted in this dissertation. Their meanings shall be interpreted in a similar 
way. 
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Kalman filter (EKF) [5"^ ch. 8], the SPKFs are extensions of tlie original 
Kalman filter [l6l HT] to nonlinear/ Gaussian systems (all such extensions 
will be called nonlinear Kalman filters in this dissertation), but they are par- 
ticularly designed to attack the problem of nonlinearity without the need to 
compute the derivatives of a nonlinear function. Instead, they all require the 
generation of some special system states, called sigma points, for the purpose 
of approximations. For this reason, they are normally known as the sigma 
point Kalman filters or derivative-free filters [721182] . On the other hand, the 
SPGSFs, are extensions of the SPKFs to nonlinear/non-Gaussian systems. 
The basic idea of a Gaussian sum filter (GSF) is to use a set of Gaussian 
distributions to approximate the pdfs of the underlying system states, as well 
as the dynamical and observation noise if necessary. It can be shown that 
a GSF essentially consists of a set of parallel nonlinear Kalman filters (cf. 
Chapter EI), while a SPGSF is just a GSF that consists of the SPKFs. 

Our last objective is to increase the computational efficiency of the afore- 
mentioned filters in high dimensional systems. As the computational cost (es- 
sentially, the computational speed) is often of a concern in practice, one may 
not wish to directly apply the above filters to assimilate high dimensional 
systems. Instead, some modifications can be introduced to increase their 
computational efficiencies. For this purpose, we will present some strategies 
that aim to reduce the computational cost and/or increase the computational 
speed of the aforementioned filters. 
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1.3.2 Approaches 

We will mainly employ two approaches, namely least squares estimation 
(LSE) and recursive Bayesian estimation (RBE), to interpret and derive 
data assimilation algorithms in this dissertation. With the knowledge of 
both the dynamical and observation noise, RBE is a uniform framework that 
can be used to derive the EnKFs, the SPKFs and the SPGSFs, as will be 
shown in subsequent chapters. LSE is equivalent to RBE in linear/Gaussian 
scenarios, but in general it may differ from RBE in nonlinear and/or non- 
Gaussian cases. For this reason, in this dissertation, we will adopt RBE 
more frequently. Nevertheless, there are still some algorithms, for exam- 
ple, the Kalman filter with fading memory (cf. § I2.3.2p . that can be better 
understood from the standpoint of LSE. 

The idea of LSE is first to specify a cost function J with respect to the 
system states. J is often quadratic, but its concrete form might be case- 
dependent. The optimal estimation x"'^* of a system state x is the one that 
minimizes the cost function jQ, i.e., 

x^P* = argmin J . (1.2) 

X 

As an example, in the next chapter we will apply LSE to derive the conven- 
tional Kalman filter in discrete linear/Gaussian systems. 

To illustrate the idea of RBE, let us take Eq. (11.11) as the system under 

''in some situations, one may instead define a "benefit" function (or utility function). 
In this case, the optimal estimation is the one that maximizes utility. 
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assimilation. Let p(xfc|Yfc_i) be the prior pdf of the state conditioned 
on the observations Yfc_i = {yfe-i, yfe-2, • ■ ■ }• Once a new observation is 
available, one updates the prior pdf to the posterior pdf p(xfc|Yfc) according 
to Bayes' rule. Based on the dynamical system Eq. fll.lal) . one can compute 
the prior pdf p(xfc+i|Yfc) at the next assimilation cycle. Concretely, suppose 
that Xfc is an m-dimensional state vector in the m-dimensional real space 
at time k. One can formulate the mathematical description of RBE as 
follows [9J: 



where pijkl^k) is equal to the value of p(vfc) evaluated at v^t = y^ — Hki'^k) 
(by Eq. fll.lbp ) and conditioned on x^., and p(xfc_|_i|xfc) is equal to the value of 
p(ufc) evaluated at = x^+i — J^k,k+i{'^k) (by Eq. (11. lap ) and conditioned 
on Xfc. Note that in Eq. (11.31) . we dropped the domain of definition M"* 
of Xfc in the integrals with respect to x^ for notational convenience. This 
convention will be adopted throughout this dissertation when it causes no 
confusion. 




p(yfc|xfc)p(xfe|Yfc_i) 



(1.3a) 



/ p(yfe|xfc)p(xfe|Yfe„i)rfxfe ' 
p(xfe+i|Yfc) = / p(xfe+i|xfe)p(xfc|Yfc)rfxfc , 



(1.3b) 
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1.4 Principal new results 

This dissertation consists of some materials drawn from the following research 
works. 

Wl. Xiaodong Luo and Irene Moroz. "State Estimation in High Dimen- 
sional Systems: The Method of The Ensemble Unscented Kalman 
Filter" , Inference and Estimation in Probabilistic Time-Series Models 
(Cambridge, United Kingdom, 18-20 June 2008). 

W2. Xiaodong Luo and Irene Moroz. "Ensemble Kalman filter with the 
unscented transform." Physica D 238 (2009): 549-562. 

W3. Xiaodong Luo and Irene Moroz. "Sigma point Kalman filters for large- 
scale systems." submitted. 

W4. Xiaodong Luo and Irene Moroz. "Sigma Point Gaussian Sum Filters I: 
Theory." submitted. 

W5. Xiaodong Luo and Irene Moroz. "Sigma Point Gaussian Sum Filters 
II: Application to high dimensional systems." submitted. 

For all the works (Wl - W5), I developed the algorithms, wrote the 
codes, ran the numerical experiments, and wrote the manuscripts. 
The principal new results in these research works are: 

• In Wl I proposed the reduced rank version of the scaled unscented 
Kalman filter (SUKF). In this way, the computational cost of the SUKF 
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in high dimensional systems can be reduced. 

• W2 is an extension of the work Wl. In W2 I considered the im- 
plementation of the reduced rank SUKF in the form of a square root 
filter. 

• In W3 I reviewed two types derivative-free filters, including the SUKF, 
and the family of divided difference filters (DDEs). Apart from the 
reduced rank SUKF introduced in Wl and W2, I also proposed the 
reduced rank DDEs, in the form of square root filters. 

• In W4 and W5 I explored the idea of Gaussian sum filter (GSF). I 
used different nonlinear Kalman filters, including the ensemble Kalman 
filter, the SUKF, and the DDFs, as the base filters of the GSF. I also 
proposed an auxiliary algorithm in order to reduce the potential com- 
putational cost of the GSF and increase its stability. 

1.5 Outline of this dissertation 

In what follows we provide an outline of the whole dissertation. We will 
point out our original works with the marker * in appropriate places, while 
in the unmarked places, we are following previous works in the literature by 
default. 

In Chapter [2] we study the data assimilation problem in linear/Gaussian 
systems, which is the foundation of the data assimilation algorithms in sub- 
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sequent chapters. We apply RBE to solve the problem, which leads to the 
well-known Kalman filter. We also derive the same result from the point 
of view of LSE. This will help us to understand one important variant of 
the Kalman filter in this dissertation, namely the Kalman filter with fading 
memory (KF-FM), designed to improve the robustness of the filter. We in- 
troduce the square root filter (SRF) as another variant of the Kalman filter 
in order to increase the numerical accuracy and stability of the filter. For 
these benefits, all the nonlinear filters to be introduced in this dissertation 
will be implemented in both the forms of the KF-FM and the SRF. 

In Chapters [3] - [5] we consider the data assimilation problem in nonlin- 
ear/Gaussian systems. We review some extensions of the original Kalman 
filter to nonlinear/Gaussian systems from the point of view of RBE. We also 
develop some reduced rank versions for some of these extensions. 

Chapter [3]focuses on reviewing the ensemble Kalman filter (EnKF). There 
are two major types of the EnKFs in the literature, called the stochastic 
EnKF and the ensemble square root filter (EnSRF), respectively. In general, 
all the implementations of the EnKF in the literature can be deemed as differ- 
ent approximation schemes to approximate the integrals in RBE numerically. 
To improve the performance of the EnKF, it is customary to introduce two 
auxiliary techniques, namely covariance inflation and filtering. Covariance 
inflation compensates for the systematic underestimation of an error covari- 
ance in the EnKF. Moreover, it also makes the EnKF behave like the KF-FM. 
For these reasons, adopting covariance inflation in the EnKF often increases 
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the robustness and accuracy of the filter. On the other hand, covariance 
filtering aims to remove spuriously large correlations between distant loca- 
tions due to the effect of small ensemble size in the EnKF. Hence, adopting 
covariance filtering may also help an EnKF to achieve a better performance. 
Through some numerical experiments, we compare the performance of the 
stochastic EnKF with that of the ensemble transform Kalman filer (ETKF), 
one of the EnSRFs. We show that the ETKF consistently outperforms the 
stochastic EnKF. 

In Chapter m we review another type of nonlinear Kalman filter, called the 
scaled unscented Kalman filter (SUKF), based on the concept of the scaled 
unscented transform (SUT). One feature of the SUKF is that it does not 
require the linearization of nonlinear systems as does the extended Kalman 
filter. Instead, the SUKF tackles the problem of nonlinearity by producing 
sigma points for the purpose of approximation. This is similar to the idea 
of the EnKF and is convenient in implementation. We conduct an accuracy 
analysis for the SUKF via Taylor series expansion. In this way, we show that 
the SUKF can achieve better accuracy than the EnKF. For data assimila- 
tion in high dimensional systems, we propose a reduced rank version of the 
SUKF in order to reduce the computational cost *. Through some numerical 
experiments, we examine the performance of the reduced rank SUKF and 
compare it with the ETKF. We show that the reduced rank SUKF outper- 
forms the ETKF (as a representation of the EnKF) given the same amount 
of information. 
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In Chapter Owe review another family of nonhnear Kalman filters, called 
the divided difference filters (DDFs), based on Stirling's Interpolation For- 
mula. The DDFs also do not require the linearization of nonlinear systems 
under assimilation. Instead, like the SUKF, they generate sigma points for 
the purpose of approximation. For this reason, the SUKF and the DDFs 
are uniformly called the sigma point Kalman filters (SPKFs) or derivative- 
free filters in the literature. We conduct accuracy analyses on the DDFs via 
Taylor series expansions. For data assimilation in high dimensional systems, 
we also propose reduced rank versions of the DDFs in order to reduce the 
computational cost *. We examine the performances of the reduced rank 
DDFs through some numerical experiments. A performance comparison be- 
tween the reduced rank DDFs, the reduced rank SUKF and the ETKF is 
also presented. 

In Chapter [6] we consider the data assimilation problem in nonlinear /non- 
Gaussian systems. To this end, we introduce the Gaussian sum filter (GSF) 
as an approximate solution. A GSF essentially consists of a set of parallel 
nonlinear Kalman filters (called "base filter" of the GSF in this dissertation). 
All the aforementioned nonlinear Kalman filters, i.e., the EnKF, the reduced 
rank SUKF and DDFs, can be adopted as the base filters of a GSF. A poten- 
tial problem of the GSF is that, in some situations, the number of Gaussian 
distributions in the GSF may increase very rapidly with time. To tackle this 
problem, we suggest conducting pdf re-approximations. We propose an aux- 
iliary algorithm based on the concept of the unscented transform in order to 
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implement the above strategy *. If the GSF adopts one of the reduced rank 
SPKFs, such as the reduced rank SUKF or one of the reduced rank DDEs, 
as its base filter (the GSF implemented in this way will be called the sigma 
point GSF, or SPGSF for short), and if the GSF is implemented in paral- 
lel, then in principle the SPGSF can achieve almost the same computational 
speed as its base filter, the reduced rank SPKF. If the EnKF is chosen as 
the base filter, there will be extra costs in conducting pdf re-approximations. 
The computational speed of the EnKF-based GSF is roughly the same as 
those of the SPGSFs. We conduct some numerical experiments to examine 
the performances of the GSFs with different base filters. We show that, in 
general, the GSFs outperform their corresponding base filters. 

In Chapter [7] we conclude the whole dissertation, and summarize the main 
results that we have achieved. We also discuss some outstanding problems 
and possible extensions of the works done in this dissertation. 
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Global analysis and balancing 



Initial conditions 







Global forecast model 



6-h forecast 



> 

(Operational forecasts) 



Figure 1.1: Flow diagram of a typical 6- hour data assimilation cycle in global 
numerical weather prediction. After Fig. 1.4.2 of [48j. 
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Chapter 2 



Conventional Kalman filter for 



linear/Gaussian systems 



2.1 Overview 

The conventional Kalman filter (KF) is "an optimal recursive data processing 
algorithm" [60j for linear systems that are possibly contaminated by some 
Gaussian noise. Here by "conventional" we mean the algorithms that were 
originally developed in the pioneering works in the late 1950s and early 1960s 
by, for example, Swerling [78l[79], Kalman [16], Kalman and Bucy [47J, which 
include the scenarios w : 



lere the dynamical and observation systems are either 



The history of the conventional Kalman filter can 



^The filtering algorithm for hybrid systems, for example, a continuous dynamical sys- 
tem measured by a discrete observation system, can be derived in a similar way. For 
details see, for example, [351 ch. 7] and the references therein 
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be found in some early lecture notes [121 EO] , and the more recent textbook 



In some early works, e.g. [IB], the conventional Kalman filter was derived 
by minimizing a quadratic cost function. This is intimately related to the 
least squares estimation (LSE) [15|, [76]. One advantage of adopting LSE 
is that it is widely studied in control and optimization theories. Thus one 
may apply many well-established methods in those fields to solve the data 
assimilation problem in various situations. However, the disadvantage would 
be that, in order to achieve the optimality in different situations, one may 
have to construct different proper cost functionqj. However, there may lack 
such a systematic method that can be employed to find the proper cost 
functions in general situations, since the cost functions themselves would 
depend on the criteria of optimality in use. 

Alternatively, the conventional Kalman filter can also be derived from 
the point of view of recursive Bayesian estimation (RBE) [32j. Under the 
framework of RBE, the data assimilation problem is solved in terms of the 
posterior probability density function (pdf ) of the system states conditioned 
on the available observations (cf. Eq. (11.31) ). The advantage of RBE is that 
one does not need to specify any optimality criterion when conducting recur- 
sive Bayesian estimation. Instead, it is after obtaining the posterior pdf that 
one makes statistical inferences, for example, estimating the mean and covari- 



^Here by "proper cost function" we mean the function that wiU lead to the optimal 
solution by minimizing it. 
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ance, according to their own criteria of optimality. Thus, without involving 
any specific optimality criterion, the framework of RBE can be applied to 
various situations without any change. On the other hand, the disadvantage 
of RBE is that one has to compute some integrals, which are often ana- 
lytically intractable. Thus some numerical methods have to be adopted to 
approximate the integrals, which, however, might be computationally very 
expensive in high dimensional systems. For this reason, the issue of how 
to reduce the computational cost in approximating the integrals will be a 
frequent topic in this dissertation, as will be seen in subsequent chapters. 

The objective of this chapter is to introduce the conventional Kalman 
filter for linear /Gaussian systems as the starting point for studying the non- 
linear Kalman filters and the Gaussian sum filters in subsequent chapters. 
The derivations of the conventional Kalman filter from both the points of 
views of LSE and RBE will be presented. In addition, two variants of the 
conventional Kalman filter, namely, the square root Kalman filter (SRKF) 
and the Kalman filter with fading memory (KF-FM), will be particularly 
discussed, since they will be frequently used in subsequent chapters. 

2.2 Problem statement and solution 
2.2.1 Problem statement 

In this chapter we consider the following scenario: a linear stochastic dynam- 
ical system is driven by a Gaussian random process. The observations of this 

22 



dynamical system are made by some instrument (the observation system), 
which is also characterized by a linear stochastic system driven by a Gaus- 
sian random process. We are interested in estimating the underlying system 
states at different times. 

To avoid complicating our discussion, here we only study a specific class 
of linear systems. Later on we will consider linear systems in more general 
situations and give some hints for deriving the corresponding assimilation 
algorithms. Thus we first confine ourselves to the following class of linear 
systems: the dynamical system is a discrete-time first order Markov process 
[38| ch. 3]. The dynamical and observation noise are uncorrelated, white and 
Gaussian with zero means. Moreover, there are no input variables existing 
in the dynamical systeno. Mathematically, we can formulate the above class 
of linear/ Gaussian systems as follows: 





(2.1a) 


Jk ='Hk^k + ^k-, 


(2.1b) 


Ufc ~ iV(ufc : 0, Qfc) , 


(2.1c) 


Vfe ~ N{wk : 0, Rfc) , 


(2.1d) 


E(ujU^) = SkjQk, 


(2.1e) 


lE(vjvJ) = 6k,jRk , 


(2.1f) 


E(u,;vj) = 


(2.1g) 



^The presence of input variables is a standard setting in many textbooks (e.g., [73]). 
where the controUabiUty of a dynamical system is often a concern. 
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Eqs. f l2.1al) and (12.1bp represent the m-dimensional dynamical system 
and the m°^''-dimensional observation system, respectively, where de- 
notes the m-dimensional system state at time k, and means the m"^^- 
dimensional corresponding observation. On the other hand, the transition 
operator Aik,k-i and the observation operator Tik are m x m and m"^"" x m 
matrices, respectively. They are both independent of the system states at 
any time. 

Eqs. f l2.1cl) -( [2.1gP imply that the m-dimensional dynamical noise and 
the m°^''- dimensional observation noise are uncorrelated, white, and Gaus- 
sian with zero means. In particular, Eqs. fl2.lcp - fl2.lfj) indicate that and 
Vfc follow white Gaussian processes with the covariance at time k being 
and Rfc, respectively. The symbol "~" in Eqs. (12. Id) and f]2.1d|) means "fol- 
lowing the distribution ". The notation A^(x : /i, S) represents a Gaussian 
distribution with x being the random variable, whose mean and covariance 
are /i and S, respectively. Finally, 6k,j denotes the Kronecker delta function, 
i.e., 

I 1, if k = j, 
h,= { (2.2) 
[ 0, if k^j. 

When there causes no confusion, we may often drop the dimension infor- 
mation of the matrices and vectors in subsequent derivations. 



24 



2.2.2 Deriving the conventional Kalman filter from least 
squares estimation 

Here we mainly follow [14j to derive the conventional Kalman filter from 
the point of view of least squares estimation. Without lost of generality, we 
suppose that at the {k — l)-th assimilation cycle one has the m-dimensional 
analysis ^l-i the corresponding m x m error covariance P^-i- For con- 
venience of discussion, we divide the procedures of the conventional Kalman 
filter into two steps: propagation (or prediction) and filtering. 

2.2.2.1 Propagation step 

According to Eq. (12. lap , the expectation of the system state x^ is given 



which is normally used as the estimation of the background at instant k. 
Since we do not know Xjt_i, the analysis will be used as an estimation 
instead. Thus, the m-dimensional background x^ is estimated as 



by 



(2.3) 



(2.4) 
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The corresponding m x m background error covariance is given by 



n =E(xt-xr)(x^-xrr 



=E{Mk,k-i{^U - xti) - u,){Mk,k-i{^U - xti) - ^kf (2-5) 
=Mk,k-in-iMlk., + Qk , 



where x^^ is the truth at instant k. Note that to derive Eq. (12.51) . we have 
assumed that the analysis error ek-i = ^t-i ~ ^t-i independent of the 
dynamical noise u^. 

2.2.2.2 Filtering step 

After a new m°'"'-dimensional observation is available, one incorporates 
the new information so as to update the m-dimensional background to 
the analysis x^. To this end, one needs to find an optimal m x m°^'" weight 
matrix (normally called Kalman gain), so that the analysis x^ updated 
according to the following rule 



x^ = xt + K, (y, - n,4) 



(2.6) 



minimizes the expectation of the energy (the cost function) 



Jk = n4 



a ||2 



ir II 2 
■k II 



(2.7) 



of the analysis error = xj 



t-^^mv- 84]. 
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The reason to use Eq. fl2.6l) to update the background is because one 
would normally expect the background, the analysis, and the observation to 
be unbiased estimations, i.e., 

Ee^ = E(xt - xr) = , 

Eel = E(x^ - x*;) = , (2.8) 
Evfc = E(yfc-7^fcX*;)=0, 

where e^, denote the background and analysis errors, respectively, while 
Vfc is the observation noise at time k. 

To see the rationale behind Eq. (12.61) . one may first write the analysis 
as a linear combination of the m-dimensional background x^ and the m°^'"- 
dimensional observation such that 

x°, = C,xt + Wfcy,, (2.9) 

where Cfc and are m x m and m x m"'''" constant matrices, respectively. 
Because of the unbiasedness, by Eq. (12. 8p one has 

E(x^ - xf) = (Cfc + WkHk - I™)Ex*; = , (2.10) 

where 1^ is the m x m identity matrix, so that = Im ~ ^k^k (with 
Tik being the m°^^ x m observation operator). Substituting this identity into 
Eq. (12.91) and replacing by K^, one obtains Eq. (12. 6p . 
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On the other hand, by definition (cf . § 11.2.41) the analysis error covariance 



=E((6^-E6^)(6^-E6^f) 

(2.11) 

Thus it is clear that the cost function in Eq. (12.71) is equivalent to the trace 
of the error covariance P^, i.e., 

J, = E\\etr = E{{etfet) = TT{Pt), (2.12) 

where the symbol Tr(«) means the trace of a matrix. Consequently, the 
optimal state estimation problem in Eq. (12. ip now becomes an optimization 
problem whose objective is to 

minimize Tr(P^) over all possible weights K^. 

Subtracting the truth x^^ from Eq. (12.61) . one has 

- xr = (xt - xf) + Kfc((y, - n,^t) - i^uA - Hk^t)). (2.13) 

Thus Eq. (12.131) can be re-written as 

el = el + Kk{^^k~nkel). (2.14) 

Therefore one can obtain the analysis error covariance in terms of the back- 
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ground error covariance by noting that 



=H4i4r) - HelieimiKl + K,E(v,(v,)^)Kr (2.15) 

Note that to obtain the above resuh, we have assumed that the background 
error and observation noise are independent, so that E(e^(vjt)"^) = 

nMeif) = 0. 

Also note that = E(e^(e^)"^) is the background error covariance, and 
Rfe = E(vfc(vfc)"^) is the covariance of the observation noise, thus one can 
re-write Eq. (12.151) as 

P^ = Pi - PMkI + K,R,Kl - K.HkPl + K,n,PlnlKl . (2.16) 

Therefore the trace of P^ is given by 

Tr(P^) = Tr(P^) + Tr(K,R,K^) - 2Tr(K,H,Pt) + Tr(Kfc7^,PMKD . 

(2.17) 

Note that to derive Eq. (I2.17p . we have utihzed the fact that P^Ti^K^ is the 
transpose of K^Ti^P^, hence their traces are equivalent. 

To minimize Tr(P^), a necessary condition for an optimal weight is 
that (iTr(P^)/c/KA; = 0. For derivation, the following differential rules of 
matrix calculus will be useful [511 P- 669]: given a constant matrix A and a 
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variable matrix B, 



(2.18) 



dB 



Tr(BAB^) = B(A + A'^ 



Applying the above rules to Eq. (12.171) and noting that covariance matrices 
are symmetric, we have 



d 



■Tr(P^) = 2(K,R, - Plnl + K.HkPlni) = 



ibn-/T\ 



dK, 



(2.19) 



Therefore the optimal weight K^^* satisfies 



by assuming that Ti^P^TiJ + is invertible Q. 
Also note that 



(2.20) 



{dKk) 



2Tr(P^; 



d 



K 



opt 



k 



2in,Plnl+R,) 

(2.21) 

is positive definite, which confirms that Tr(P^) attains its minimum at K^^*. 
Substituting Eq. (12.201) into Eq. (I2.16P and with some algebra, it can be 



shown that 



(2.22) 



^Otherwise one has to adopt the generahzed matrix inverse here. 
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Remark: We note that the analysis update formula Eq. fl2.6p can also 
be obtained by minimizing the following quadratic cost function 



J(xfc) = ^(xfe-xt)^(Pt)~'(x,-xt) + ^(y,-7^fcX,)^R-^(yfc-7^feXfc). (2.23) 



For a proof, please see Appendix [Al This fact will be used later in § 12.3.21 to 
derive the KF-FM, a variant of the conventional Kalman filter. 

2.2.3 Deriving the conventional Kalman filter from re- 
cursive Bayesian estimation 

Here we mainly follow ^72], § 2.2] to derive the conventional Kalman filter from 
the point of view of recursive Bayesian estimation. Without loss of general- 
ity, we suppose that at instant k—1, one has the posterior pdf p(xfc_i|Yfe_i) = 

-1 ■ ^fc-u Pfc-i) conditioned on the observations Yk-i — {y^-i, yfc-2, ■ ■ ■ }, 
where x^_^ and P^_i are the analysis and the corresponding error covariance, 
respectively. 

2.2.3.1 Propagation step 

At the propagation step, one adopts Eq. (11.3b|) to compute the prior pdf 
p(xfc|Yfc_i) at instant k, so that 



p(xfc|Yfc_i) = j p(xfc|xfc_i)j9(xfc_i|Yfc_i)(ixfc_i . (2.24) 
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By Eqs. f E^aj) and (^J^, p(xfc|xfc_i) = A^(xfc : Mk,k-i^k-i,Qk)- Sub- 
stituting this into Eq. fl2.24p . one has 

p(xfc|Yfe_i) = j A^(xfe : A^fc,fe_iXfc_i,Qfc)A^(xfc_i : Xfc_i,P^_i)rfxfc_i . 

(2.25) 

With some algebra, it can be shown that, p(xfc|Yfc_i) follows a Gaussian 
distribution A^(xfc : x^,P^), where 

^l = Mk,k-i^t^i, (2.26a) 
Pt = PLi + Q, . (2.26b) 

The detailed deduction is provided in Appendix [B], also cf. [721 § 2.2]. 
2.2.3.2 Filtering step 

After a new observation arrives, one uses Bayes' rule to update the prior 
pdf ]9(xfc|Yfc_i) to the posterior p(xfe|Yfc), so that 



p(xfc|Yfc) =p(xfc|yfc, Yfc_i) 

_ p(yfc|xfc, Yfc_i)p(xfc| Yfc_i) 

p(yfc|Yfc_i) 
_ p(yfc|xfc)jp(xfc|Yfc„i) 
/ p(yfc|xfc)p(xfe|Yfe__i)(ixfc 



(2.27) 



The third line in Eq. (]2.27p holds because y^ is considered independent of 
the historical observations Yfc_i. 
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By Eqs. (12.1b|) and (12.1dp . p{yk\^k) = Niy^ : Hk^k,^k)- Substituting it 
into Eq. (12.271) . one has 



. IV ^ - A^(y,:7^,x,,R,)iV(x,:xt,Pt) 
Plx,| - ^^^^^ H,x,,R,)iV(x, : xtP^)c/x, ' ^'-'"^ 

Following the same rationale in Eq. (I2.25p . it can be shown that 

N{yk : Hk Xfc, Rfc)iV(xfc : x^, Pl)d^k = N{yk : Hk x^, Hk + Rfe) • 



(2.29) 

Hence with some algebra, Eq. (12.281) is reduced to 



iV(yfc :7^fcXfc,Rfc)iV(xfc :x^,P^^ 



(2.30) 



iV(xfc : 



"^fc) ^k) 



where 



x^ = x^ + K, (y, - Hk^) , 



(2.31a) 
(2.31b) 



with 

K, = PlnliHkPM + ^k)-' . (2.32) 

The deduction of the equality between the first and second lines of Eq. (12.301) 
is also provided in 172, § 2.2]. Since p(xfc|Yfc) is Gaussian, the updated 
mean x^ and covariance P^ in Eq. (I2.3ip contain sufficient information for 
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characterizing it. 



2.2.4 Summary of the conventional Kalman filter al- 
gorithm 

We summarize the recursive steps of the conventional Kalman filter as follows. 
Propagation step: 

4^Mk,k-iK-i: (2-33a) 
= Mk,k-i PLi Mlk_^ + Qk . (2.33b) 

Filtering step: 

K, = FlnliHkFlnl + Rfe)-^ , (2.34a) 



X 



^ = + K, (y, - Hk^) , (2-34b) 



Fl^Pl-K.HkFl (2.34c) 



2.3 Two variants of the conventional Kalman 
filter 

Now we introduce two variants of the conventional Kalman filter, namely the 
square root Kalman filter (SRKF) and the Kalman filter with fading memory 
(KF-FM). These two variants can benefit the performance of a filter, as will 
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be discussed below. 



2.3.1 Square root Kalman filter 

The error covariance matrices, e.g., and P^, should be symmetric and 
positive definite. However, in numerical computations, these properties may 
not be preserved due to the finite computational precision [731 ch. 6]. As a 
remedy for this problem, it is customary to use the square root Kalman filter 
(SRKF), which can be derived based on the conventional form. 

For illustration, we first re-write the covariance matrices P^ and P^ at 
assimilation cycle k as 



where and are called square root matrices of P^ and P^, respectively. 
We suppose that the dimensions of and are m x and m x m|", 
respectively, where m^'' and m^" may vary from cycle to cycle. In this way, 
in the SRKF we can propagate and update the square roots, rather than the 
corresponding covariance matrices, as to be shown below. 

At the propagation step, in order to compute based on S^_;^, we sub- 
stitute Eq. fl235D into Eq. fl2.33bp . so that 



(2.35a) 



(2.35b) 



SliSlf = {Mk,k^iSU)iMk,k-iSl_,f + Qk ■ 



(2.36) 
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where is the mxm covariance matrix of dynamical noise. Therefore, one 
can just take as a square root of the matrix 



(2.37) 



Note that by definition = G^. Here we use the notation Gk to represent 
the right hand side of Eq. (12.361) . for notational convenience. If there is no 
dynamical noise so that = 0, then one may simply let = Aik^k-iS^-i- 
Otherwise, one may adopt the following method to compute a square root 
matrix of numerically. Suppose that G^ is an m x m matrix, then by 
definition G^ is symmetric and positive semi-definite, so that we can perform 
a spectral decomposition on G^ [29l § 2.5.3]. In doing this, G^ is decomposed 
as 



where = [e^,!, ■ • • ,ej^^^s^b] is the matrix that consists of all mj!' eigenvec- 
tors Bk^i of Gfc, whose corresponding m^* eigenvalues dk^i (i = 1, ■ ■ ■ , mf') are 



diagonal consists of the positive eigenvalues dk,i of G^. Therefore, the dimen- 
sions of the matrices and are m x ml^ and m^^ x m^^, respectively. 
We then define a square root (D^)-^/^ of as 




(2.38) 



positive, and = diag{dk^i, ■ ■ ■ 



dj^ ,^sh) is the diagonal matrix whose main 




(2.39) 
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i.e., (D^)V2 



is a diagonal matrix with its main diagonal consisting of the 



square roots of dk^i {i = 1, ■ ' ' i^f)-, so that (D^)-^/^ is also an m^^ x mf 
matrix. Then we can let a square root matrix of be 



so that is an m X m^* matrix. In this way, it can be verified that S^(S^)^ = 
Gfc, and it is guaranteed that the product S^,(S^)^ is positive semi-definite in 
numerical computations. Note that, S^. obtained in this way is unique with 
respect to the method we adopt. But is in general not the unique square 
root of P^, as will be explained later. 

On the other hand, in order to compute based on at the filtering 
step, we substitute Eq. fl2.34ap into Eq. fl2.34cp . so that 



By writing the covariance matrices in terms of their square roots, we have 



(2.40) 





(2.42) 



where l^sb is the mf x mf identity matrix. 



k 




(2.43) 
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represents the projection of the square root onto the observation space. 
By definition is an m"^"" x m^* matrix (with Tik being an m"'''" x m matrix), 
while Rfc is the m"^^ x m"'''" covariance matrix of the observation noise. Thus 
one has the general solution of Eq. (12.421) given by [80] 

= StZ.U, , (2.44) 

with being an m x matrix, an m^'' x mf."" square root matrix of 
I^sb — (S^)'^(S^(S^)^ + Rfe)~^S^, and an arbitrary x matrix such 
that UfcU^ = Imi"; with being a positive integer, and 1^^" the m^" x m^" 
identity matrix. Note that it can be shown that I^sb — (S^)"^(S^(S^)-^ + 
Rfc)^^S{! is symmetric and positive definite [511 Thm. 6], thus we can also 
use spectral decomposition to compute Z^, the same as the case in computing 

at the propagation step. Therefore, here ml"' corresponds to the number 
of positive eigenvalues of I^st, — (S^)"^(S^(S^)^ + Ra,.)~^S^. Because of the 
positive definiteness, we have m|." = m^''. However, in general discussion, 
we choose to keep using the notation ml"'. Also note that, in principle 
can be an arbitrary number. But in certain circumstances, there might be 
some constraints on the choice of m^, as will be seen in § 13.3.21 In addition, 
the freedom in choosing the matrix Ujt also implies that, in Eq. (I2.44p the 
square root usually is not unique (although Z^ is unique with respect to 
the method we adopt in numerical computation). 

Analogous to the conventional Kalman filter, the steps of the square root 
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Kalman filter can be written as: 
Propagation step: 



Filtering step: 



^l = Mk,k-i^U, (2.45a) 



{Mk,k-iSt_,){Mk,k-iSl_,r + Qk, (2.45b) 
= H,Sl . (2.45c) 



K, = Sl{Stf{St{Stf + Rkr\ (2.46a) 
= + Kfc (yfc - Hk^) , (2.46b) 



= ^JI^f - (StnStiStV + Rk)-'Sl , (2.46c) 
= S^ZfcUfc , (2.46d) 



where in Eqs. (I2.45bp and fl2.46cl) the symbol vA means a square root of 



the matrix A. This can be calculated through a certain numerical scheme, 
for example, the spectral decomposition. For convenience, the information 
of the dimensions of the matrices involved in the above steps is summarized 
in Table O 



2.3.2 Kalman filter with fading memory 

In § 12.2.21 we have mentioned that, the analysis update formula Eq. (12.61) 
at the filtering step of the conventional Kalman filter can be obtained by 
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Table 2.1: Information of dimensions involved in the steps of the SRKF. 



Number 


Meaning 




m 


Dimension of the state space 




^obv 


Dimension of the observation space 




^sb 


Number of positive eigenvalues 


of 


iMk,k-iSl_,)iMk,k-iSt_,f + Qk 




ml'^ (= ml^) 


Number of positive eigenvalues 

Imf - i^kVi^ki^kV + ^k)~^Sk 


of 


Matrix 


Dimension 




Mk,k-i 


m X m 






^obv ^ ^ 




Qk 


m X m 

^obv X ^obv 






m X rrf^^ 




s| 


tb 

m X 






X mf 




Zfe 


mf' X m^" 






m^" X 






m X 





minimizing the following cost function 

J(x,) = l(x,-x^)^(Pt)-^(x,-xt) + ^(y,.-7^,.x,)^R-i(y,-7^,x,). (I22S1) 

The cost function J(xfc) consists of two terms. The first term on the right 
hand side (rhs) of Eq. (12.231) represents the information contents in the past, 
such as the initial condition xq, and the historical observations Yfc_i = 
{yi}i=o^ up to and including time k — 1. The second term represents the 
information content of the incoming observation y^, which provides addi- 
tional information to update the background to the analysis at time k. 
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In practice, more often than not, the information contained in the first 
term on the rhs of Eq. (12.231) may not completely reflect reality for some 
reasons, for example, our knowledge limit in understanding the underly- 
ing mechanism of the dynamical system, or the limit in model resolution. 
In contrast, the observation system might be better characterized and the 
observations are normally recorded with certain accuracy. In such circum- 
stances, instead of using Eq. (12.231) as the cost function, it may be better 
to put more relative weight on the second term on the rhs of Eq. (I2.23P in 
order to emphasize that one is more confident on the incoming observation. 
To this end, one can choose the following modified cost function: 

J;(x,) = i(xfc-x^)^(Pt)-^(x,-xt) + i(l+5)^(y,-7^,x,)^R,i(y,-7^,x,), 

(2.47) 

where 5 > is a non- negative scalar constant, and is called the covari- 
ance inflation factor in this dissertation 3. By choosing a cost function like 
Eq. (12.471) at each assimilation cycle, the relative weights of the historical 
information contents will drop faster than the situation without any covari- 
ance inflation. For this reason, the filtering algorithm derived with the cost 
function in Eq. (12.471) (see below) is called the Kalman filter with fading 
memory (KF-FM) [73, p. 208]. With a fading memory, the filter can be 
more robust against the inaccurate information contents in the past, e.g., 
the errors in specifying the initial conditions, or the occasional outliers of the 
^The choice of this factor will be discussed in § 13.3.3.11 
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observations in the past. 

To derive the KF-FM, one may re-write Eq. (12.471) as follows: 

J,(x,) = ^(x, - 4fiPlr\^k - 4) + liyk- Hk^kfR^'iYk - ^fcXfc) , 

(2.48) 

where Jd(xfc) = (1 + 6)^'^Jf{'Xk) is a discounted cost function of Jfi^Kk), 
and = (1 + 5)^Pfc is the inflated background error covariance. Thus the 
analysis update formula of the KF-FM is derived by minimizing Jrf(xfc) in 
Eq. (IXiHD . In the spirit of the derivation in § 12:2:21 the steps of the KF-FM 
are then given by: 
Propagation step: 

xt = ^,,fe-ixLi, (2.49a) 
Pi = Mk,k-i PLi Ml,_^ + . (2.49b) 

Filtering step: 

P^ ^ (1 + 5)'Pl , (2.50a) 
K, = Plnl{HuP\nl + Rfc)-i , (2.50b) 
x^ = xt + Kfc (yfc - HuA) , (2.50c) 
P^ = P^ - K,n,Pl . (2.50d) 

where Eq. (I2.50al) means that one conducts covariance inflation on P^, and 
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uses the inflated covariance (1 + SyP\ to replace in subsequent compu- 
tations. 

2.4 Summary of the chapter 

In this chapter we considered the data assimilation problem in a specific class 
of linear/Gaussian systems. The solution to the problem, which turned out to 
be the well-known Kalman filter, was derived from both the points of views of 
least squares estimation (LSE) and recursive Bayesian estimation (RBE). We 
also introduced two variants of the conventional Kalman filter, namely the 
square root Kalman filter (SRKF) and the Kalman filter with fading memory 
(KF-FM). The SRKF was introduced to improve the numerical precision of a 
filter, while the KF-FM was designed to improve its stability (or robustness). 
As will be shown in subsequent chapters, in practice one can implement these 
two variants simultaneously to improve the performance of a filter. 

Before closing this chapter, we would like to give some hints or references 
for deriving data assimilation algorithms for linear systems that do not fall 
into the category described by Eq. (12. ip : 

• One can convert a higher order Markov process into a first-order one by 
introducing some argumented variables. This is similar to the idea of 
converting a higher order scalar autoregressive (AR) process into a first- 
order vector autoregressive (VAR) process [Sni ch. 2]. For example, to 
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write 



into the form of a first-order Markov process, we define the argumented 
variable 

Xfc 

Thus we have the desired form given by 



7 



I 











Xfc-1 + 









where I denote the identity matrix. 

The conventional Kalman filter for continuous systems can be derived 
by letting the time steps of discrete systems tend to zero [TSJ ch. 8]. The 
filter for hybrid systems (e.g., continuous dynamical systems observed 
by discrete ones) can be obtained in a similar way, although one may 
also derive the algorithm from other points of views. For example, see 
ch. 7]. 



In Eqs. (12. lap and (I2.1b|) . the dynamical noise and the observation noise 
can also be correlated and/or Gaussian coloured. One can generalize 
the conventional Kalman filter to accommodate such situations. For 
details, see [721 ch. 7]. 
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Chapter 3 

Ensemble Kalman filter for 
data assimilation 

3.1 Overview 

In the previous chapter, we derived the conventional Kalman filter based on 
two fundamental assumptions, namely, the linearity of the dynamical and 
observation systems and the Gaussianity of the dynamical and observation 
noise. In practice, these two assumptions are often violated. Moreover, a 
practical aspect not mentioned previously is the computational cost, which 
may not be a problem for low dimensional systems, but will be an important 
factor in consideration when assimilating high dimensional systems like a 
weather forecasting model. Thus in this and the next few chapters, we will 
introduce some filters that are designed to tackle some, if not all, of the 
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following problems: nonlinearity, non-Gaussianity, and high dimensionality. 

In this chapter we focus on studying the ensemble Kalman filter (EnKF) 
initially proposed in [23]. The EnKF is essentially a Monte Carlo implemen- 
tation of the Kalman filter (see |16] for a rigorous proof). Suppose that, at 
the beginning of each assimilation cycle, one has an ensemble of the back- 
ground (called background ensemble), usually obtained from the previous 
assimilation cycle. Then, with an incoming observation, one applies the KF 
scheme to update each individual member of the background ensemble. To do 
this, the mean and error covariance of the background are approximated by 
the sample mean and sample covariance of the background ensemble, so that 
one can apply Eqs. fl2.34ap and (I2.34bl) to obtain an ensemble of the analysis. 
The analysis ensemble is then used to estimate the mean and covariance of 
the underlying system states. By propagating the analysis ensemble forward 
through the dynamical system, one obtains a new background ensemble for 
the next assimilation cycle. In this way, by using only a small ensemble to 
evaluate the statistics (mean and covariance) at both the propagation and 
filtering steps, the computational cost of the filter can be reduced [23] . 

Depending on whether to perturb the observations or not, the EnKF 
can be classified into two types: stochastic and deterministic [IHIEO]- The 
stochastic EnKF uses the incoming observation and the covariance matrix 
of the observation noise to produce an ensemble of perturbed observations, 
which are then used to update the background ensemble. For examples, see 
[151 [231 [Ml [251 [261 [33]. In contrast, the deterministic EnKF, often known 
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as the ensemble square root filter (EnSRF), does not perturb the incoming 
observation. Given a background ensemble, the EnSRF uses the incoming 
observation to update the sample mean of the background, while the analysis 
ensemble is taken as the sample mean plus some perturbations derived from 
the updated square root of the analysis error covariance. For examples, see 
[SI [131 [SS] , also see the reviews in [211 (251 [HO] • Apart from the aforementioned 
EnKFs, there are also some other variants in the literature. For examples, 
see [IIlinKHIKM]. 

In this chapter we first present the mathematical descriptions of both 
the stochastic and deterministic versions of the EnKF. We introduce two 
auxiliary techniques, namely, covariance filtering and infiation, which are 
useful for improving the performance and robustness of the EnKF. Finally, 
we use a 40-dimensional system as the testbed to examine the effects of some 
parameters on the performance of the EnKF. 
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3.2 Problem statement and a Monte Carlo 
approximation to the solution 

Similar to Eq. fl2.ip . we consider the data assimilation problem in the follow- 
ing scenario: 

Xfc = A^fc,fc-i (xfc_i) + Ufc , (3.1a) 

Yfc = T^fc (xfc) + Vfc , (3.1b) 

Ufe ~ AT (ufc : 0, Qfc) , (3.1c) 

Vfc ~ iV (v,, : 0, Rfc) , (3.1d) 

E {ujul) = 6k,jQ,k , (3.1e) 

E(v,v^) =4,iRfc, (3. If) 

E(u,vj)=0 V^,j. (3.1g) 

Note that, in general, the systems in Eq. (13. ip are different from those in 
Eq. (12. ip . since the dynamical system Eq. ( 13. lap and the observation system 
Eq. ( I3.1bp are both possibly nonlinear, i.e., J^k,k-i and Hk are possibly 



nonlinear functions. Again, we assume that the dimensions of the state 
space and the observation space are m and m°^'", respectively. But when 
there causes no confusion, we may often drop the dimension information. 

Eq. IKIB and (l3ldl) mean that we assume both the dynamical and obser- 
vation noise are Gaussian. This may not always be realistic in practice. But 
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at the moment let us be content with this assumption. Later in Chapters E] 
we will address the issue of non-Gaussianity, where the main idea is to con- 
duct pdf approximations. Also note that by Eqs. (13. lei) - p.lgp , we assume 
again that the dynamical and observation noise are white and uncorrelated. 

Before proceeding to introduce the details of different versions of the 
EnKF, we would like to give an outline of the Monte Carlo approximation 
to the solution of the data assimilation problem in Eq. (13. ip . from the point 
of view of recursive Bayesian estimation (RBE). In doing this, one may see 
the rationale behind the EnKF. 



3.2.1 Propagation step 

We suppose that at the {k — l)-th assimilation cycle, one has an n-member 
analysis ensemble X^_^ = rather than the posterior pdf p (xfc_i|Yfc_i). 

Thus to use Eq. (11.3bp 

p(xfc|Yfc_i) = J p(xfc|xfc_i)p(xfc_i|Yfc_i) (ixfc_i (11.3bD 

in RBE to compute the prior pdf p (xfc|Yfc_i) at the next assimilation cycle, 
one approximates p(xfc_i|Yfc_i) in terms of the analysis ensemble X^_^ by 
[721 ch. 7] 

1 " 

p(xfc_i|Yfc_i) ^ - ^5 (x,_i - x^,J , (3.2) 

i=l 
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where S is the Dirac delta function, so that 



-foo, if X = , 
S{^) = { (3.3) 

0, otherwise , 



and 

V(x)5(x-c)rfx = /(c) (3.4) 



for a function / and constant c. 

Also note that by Eqs. fl3.1ap and (I3.1cp . one has 

p (xfc|xfc_i) = (xfc : Mk,k-i (xfe_i) , Qk) ■ (3.5) 

Substituting Eqs. (13. 2p and (13.50 into Eq. (Il.3bl) . one has 



1 " /■ 

p (xfc|Yfc_i) ~ - / ^ (xfc : Mk,k-i (xfc_i) , Qfe) 5 (xfc_i - Xfc_i_i) dxk-i 
^ i=i 

1 " 

= -J2N{^k:Mk,k-i (xLj,Q,) , 

1=1 

(3.6) 

which is the sum of a set of Gaussian pdfs. 

To evaluate the mean x and covariance P of a Gaussian sum pdf p (x) 
given by 

n 

p (x) = ^ cAT (^x : x„ P,) , (3.7) 
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which contains a set of Gaussian pdfs | : x^, P^^ | with the normal- 

n 

ized weights {c^}"^^ {Yli^s = 1), the following formulae ch. 8] will be 

s=l 

useful: 



n 

X 



^c,x, , (3.8a) 

s=l 

n 

P = XI (Ps + (x - X,) (x - x,)"^) . (3.8b) 



s=\ 



Applying Eq. (13. 8p to Eq. (13.61) . one has the estimated mean x^ and covari- 
ance P^ of the background at the /c-th assimilation cycle given by: 

xt ^ = (x^i , 2 = 1, 2, • • • , n , (3.9a) 

n 

^'^ = -Y.<^^ (3.9b) 

i=l 
1 " 



n 

i=l 



where x^ ■ are the forecasts of the propagations of X^_j^ = {^k-i,i}^-i- 
3.2.2 Filtering step 

After a new observation jk is available, one updates the prior pdf p (xfe|Yfe_i) 
to the posterior p (x^jY^) according to Bayes' rule Eq. (I1.3ap : 



p (xfe Yfe) = r-i— . lL3aJ) 

j p(yfc Xfc)p(xfc Yfe_i) dxfc 
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By Eqs. fl3lB and fIXIdll . 



p(y,,|xfe) = iV(yfc :7^fe(xfe),Rfc) . (3.10) 

In evaluation, we approximate p (x^l Yfc_i) by a Gaussian pdf ^x^ : x^, P'^j , 
where x^ and are the estimated mean and covariance of the background 
given in Eq. (13. 9p . Doing this imphes that we assume the underlying sys- 
tem state Xfc follows (or can be approximated by) a Gaussian distribution. 
Therefore, Eq. (I1.3ap is reduced to 



NiYk-Hk (xfe),R,)iv(x, ix^P^) 

P (xfelYfc) = ^ , (3.11) 

/ iV (yfe : (xfc) , Rfc) N (^Xfc : x^, P^ rfx^ 

If the observation operator Tik is nonlinear, the pdf p(xfc|Yfc) may not 
have a closed form as in linear/Gaussian systems (cf. Eq. fl2.28p in the 
previous chapter). As an approximation, one may choose to linearize Tik 
first, and then apply the identity in Eq. (I2.30p to obtain an approximate 
closed form for p (x^IYa;). 

Concretely, one first expands Hk (x^) so that 

Hk (x,) = Hk (x^) + H,| - . 5xfc + h.o.t , (3.12) 



where H^l^t denotes the Jacobian matrix of Tik evaluated at x^, (5xfc = x^ — 
x^,, and "h.o.t" represents the higher order terms of Taylor series expansion. 
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If the perturbation ^x^ is small, or is weakly nonlinear such that the 
higher order derivatives of Tik evaluated at are small, one may discard 
those higher order terms and approximate Tik (x^) by 

Haxfe)~Hfc(xt)+Hfc5x, 

(3.13) 

= Hfcx,. + [Hk (xt) - Hfext) • 

For notational convenience, we dropped the localization information of the 
Jacobian of Tik in Eq. (13.131) . Substituting Eq. (13.131) into Eq. (I3.10p . we 
have 

N (yfc : Hk (xfc) , Rfc) 

^N{yk: HfcX, + [Hk (x^) - RfcX^) , Rfc) (3-14) 
= iV(y*^:HfcXfc,Rfc) , 

where y^^ = Yk — {'Hk (x^) — HfcX^) is a translation of the observation y^. 
Therefore we can approximate p{'Xk\Yk) by 

N{yi^:Hk^k,Rk)N(^k:^i,Pi) 

p(xfc|Yfc) ^ f 

/ N (y*- : HfcXfc, R^) N (^x^ : x^, P^j dx^ (3.15) 

= iv(xfc:x^,P^) , 
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where 



= + K^yr - H,xt) (3.16a) 

= yi + K,{y,-nk (xt)) , 

P^ = Pt-K,H,Pt (3.16b) 

K, = Plul (u.Piul + Rfc) , (3.16c) 

are obtained in the spirit of Eq. (12.301) . 

To carry out ensemble forecasting at the next assimilation cycle, one also 
needs to generate an ensemble = analysis as the samples 

of the pdf ^Xfc : x^, j . The approach to generating the analysis ensemble 

is called analysis scheme. Different implementations of the EnKF may 
have different analysis schemes, as will be shown below. 



3.3 Implementation of the ensemble Kalman 
filter 

3.3.1 Stochastic ensemble Kalman filter 

Given a background ensemble X^ = {x^ • : x^ ■ = Mk,k~i (^fc-i.i) j^^iQ, in 
principle the computations of the sample means and covariances of the back- 

^More precisely, is the ensemble of the forecast of the analysis ensemble at the 
previous cycle. However, for brevity, we choose to call it the "background ensemble" in 
this dissertation unless otherwise stated. 
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ground and analysis can just follow Eqs. fl3.9l) and fl3.16l) . respectively. But in 
the literature there may be some different ways in computing or approximat- 
ing those statistics. For example, at the propagation step, when computing 
the background covariance P^, one may use the following formula 

1 " 

= ^ E - -^if+Qk, (3.17) 

1=1 

which differs from Eq. fl3.9cp in that the factor before the summation is 
l/{n — 1), rather than 1/n. The factor 1/n used in some works (e.g. [S7] ) 
represents the maximum likelihood estimation of the background covariance, 
while the factor l/{n — 1) used in others (e.g. [33]) represents the unbiased 
estimation. The difference between these two estimations is not significant 
even for a small number n (say, around 10), thus we do not particularly favor 
either the criterion of maximum likelihood or unbiasedness. However, since 
a larger background covariance may benefit the performance of the filter (see 
the discussion in § I3.3.3.ip . we will normally adopt the unbiased estimator 
in this dissertation unless otherwise stated. 

Another point worth mentioning is that, in practice, it may not be con- 
venient to evaluate the Jacobian of a nonlinear function in a multivariate 
scenario. Thus in order to evaluate the Kalman gain and the sample 
covariance of the analysis in Eq. (I3.16p . we adopt the following approxi- 
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mations [51] : 



Kfc = P-(pr + R,)"' , (3.18a) 



T 



Pl = Pl-K,{PZ) , (3.18b) 



with 



" =1 

n E - (^^^ (xt,) - y.)^ , (3.19b) 

1=1 

— E (^^ W.) - (^'^ - y^^)^ ■ (^-is^) 



■per 



n . 

1=1 



" n . 

i=l 



P^'' in Eq. ( I3.19b|) is the (sample) cross covariance between the background 
ensemble and its predicted projection onto the observation space, while P^'^ 
is the (sample) covariance of the predicted projection of the background en- 
semble onto the observation space. Hereafter we will call P^^ and P^'^ cross 
covariance and projection covariance, respectively. Note that, Eq. fl3.19al) 
represents an unbiased estimation of the mean of the projection of the back- 
ground ensemble onto the observation space. In the literature, there may be 
other ways for estimation. For example, in [91] Eq. (I3.19ap is replaced by 



Yk = Hk (xt) , (3.20) 
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which is then used for subsequent computations in Eqs. (13.19bp and fl3.19cp . 



In doing this, Eq. fl3.20p represents a maximum hkehhood estimation of the 
mean of the projection of the background ensemble onto the observation 
space. If the observation operator Tifc is linear, then Eq. fl3.19ap and fl3.20p 
are equivalent. Otherwise they are different in general. Thus which equation 
to choose may depend on the favour of the user. In this dissertation, we 
prefer to using Eq. fl3.19ap . Because in doing this, the similarity between 
the stochastic EnKF and the scaled unscented Kalman filter (SUKF) to be 
introduced later will be more clear by comparing Eq. (13.190 with (14.430 in 
the next chapter. 

To generate the analysis ensemble, the stochastic version of the ensemble 
Kalman filter (stochastic EnKF hereafter) needs to produce some surrogate 
observations = {y^ where ^ are the samples drawn from the 

Gaussian distribution with mean y^ and covariance R^. The analysis en- 
semble = consists of the updates of the sample mean of 
the background according to Eq. (I3.16ap . but with the observation therein 
replaced by the surrogate observations Y^. Concretely, one has 

x^, = ±l + Kk {yl, - Uk (xt)) , z = 1, ■ • • ,n. (3.21) 

To summarize, the implementation of the stochastic EnKF contains the fol- 
lowing procedures: 
Propagation step: 
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xt,, = Mk,k^i (x^i J , z = 1, ■ ■ ■ , n , (3.22a) 
1 

i=l 

1 

y'^ = -E^'^K.)' (3.22c) 

2=1 

— E - ^9 (4. - + Q. , (3.22d) 

i=l 
1 " 

P^:^ = ^ E - ^'fc) (^'^ K.) - y^)^ ' (3.22e) 



' ~n 

1=1 



i=l 



^ E (^'^ K.) - y^) K.) - y'^)^ ■ (3.22f) 

Filtering step: 

Kfc = P^/(pf + , (3.23a) 

x^ = x^ + K,(y,-7^,(xy) , (3.23b) 

P^ = Pt-K,(p-)^. (3.23c) 

Analysis scheme: 

y^.^Ar(y-y,,R,) , (3.24a) 

x^, = xt + Kfc (y^, - T^fc (xt)) , 2 = 1, ■ ■ ■ , n , (3.24b) 

where Eq. f l3.24al) means that the surrogate observations y^^ are samples 
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drawn from the Gaussian distribution (y^ : y^, R^). Also note that in prac- 
tice, it is not necessary to evaluate the covariances and in Eqs. ( 13.22dp 
and fl3.23cp for computations. However, here we still choose to list them for 
completeness. 

Remark: Sampling the Gaussian distribution N {yl : yfc,Rfc) normally 
brings some sampling errors. This causes a problem in that the sample 
covariance computed based on the analysis ensemble = {x^ ,;}"__^ may 
not be the same as the targeted covariance given by Eq. fl3.23cp . Instead, 
it was shown in [SH] that the sample covariance computed based on the 
analysis ensemble = {x^^}"_^ will underestimate the targeted covariance 
in Eq. f l3.23cl) due to the effect of finite ensemble size, which may cause the 
divergence of the EnKF. As a remedy for this problem, we will proceed to 
introduce a different implementation of the EnKF based on the concept of 
square root Kalman filter (SRKF) in § 12.3.11 We will also introduce the idea 
of covariance inflation in § 13.3.3.11 in order to compensate for the covariance 
underestimation. 

3.3.2 Ensemble square root filter 

As aforementioned, generating surrogate observations will introduce sam- 
pling errors to the EnKF. To overcome this problem, a simple idea is to 
avoid perturbing the observations. Suppose that at each assimilation cycle, 
one wants to generate n-member ensembles for both the background and 
analysis. In order to generate an analysis ensemble with its sample covari- 
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ance matching that in Eq. fl3.23cl) . one may use the sample mean and a 
square root of the m x m covariance to generate the analysis ensemble 

= This leads to the ensemble square root filter (EnSRF) in 

the literature PUSKHnj. 

For illustration, suppose that one has a square root of the covariance 
P^, which is updated from a square root of the background covariance 
P^ according to a certain rule. Then the analysis ensemble is generated 
according to the following formula, 



c^ + v/rr^(S^),,2=l,2,---,n, (3.25) 



where (S^)^ denotes the i-th column of the square root matrix S^. Note 
that in Eq. fl3.25p . the sample covariance of the analysis ensemble {x^ 
is equivalent to P^. However, its sample mean may not be equivalent to x^ 
unless 

n 

E(Sa = 0. (3.26) 

i=l 

Failing to satisfy Eq. (13.261) means that, compared with the mean of the 
analysis evaluated by Eq. fl3.23b[) . there is a bias in the sample mean of 



the analysis ensemble produced by Eq. (I3.25p . which may cause covariance 
underestimation, as reported in ^51j. For this reason, the ensemble filters 
satisfying the constraint in Eq. (13.261] , called unbiased ensemble filters in 
|51j . is favored in this dissertation. 

The propagation and filtering steps of the EnSRF may follow those of the 
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square root Kalman filter in § 12.3. 1[ Concretely, let the background ensemble 
= {^k,i}^-i be the forecasts of the propagations of = {^k~i,i}^-i 

(cf. Eq. f l3.22al) ). then the sample mean and covariances P^. of the 
EnSRF are exactly the same as those in Eq. (13.221) . Here we suppose that 
the rank of the background covariance is m^^, which is determined by both 
the background ensemble and the covariance of dynamical noise. In 
the case that there is no dynamical noise, = n — 1 given n independent 
ensemble members [87]. But with the existence of Q^, in general m|'' > n — 1. 
Again, for numerical reasons, it is customary in the EnSRF to re-write the 
covariances in terms of their square roots [6l [131 EH] ■ 



per 



^k ~ 



(3.27a) 
(3.27b) 
(3.27c) 



where S^*, and are square root matrices satisfying 



Sxb 
h 



\/n — 



^ki 



1 ■^k,n 



Sf (Sf)^ + Q, 



Sfc=^/c(Sfc)= 7^fe((St)i), ■ ■ ■ 



, HkiiS 



kJmi 



(3.28a) 

(3.28b) 
(3.28c) 



with {S\)i being the i-th column of S^, and S^^, and are m x n, 
m X mf, and m"''" x mf square roots, respectively, where m"''" represents 
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the dimension of the observation space. Note that in Eq. (13.28bp . because 
S^^ (Sfc'')"^ + Qfc is positive semi-definite, one can adopt the method discussed 
in § 12.3.11 to calculate S^, so that it is guaranteed that the numerical result 
of the product Sl^S^.)^ is positive semi-definite. In particular, if there exists 
no dynamical noise, then it is customary to take S^^ in Eq. (I3.28ap as the 
square root of P^,. For examples, see P, [131 189] . 

Analogous to the situation in § 12.3. H the Kalman gain and the square 
root of the sample covariance are expressed in terms of and S^, so 
that 



where Ufc is a matrix satisfying Ufc(Ufc)"^ = I^f ? with I^st being the 
dimensional identity matrix (cf. § 12.3.11) . In Eq. fl3.29ap . the Kalman gain 
Kfc is an m X m°^'" matrix. The dimensions of T^, and Ufc will be given 
below. 

In the literature, there are different implementations of the EnSRF. For 
examples, see [HI [13], [HE] . Essentially, these implementations differ from one 
another only in the choice of the matrix [80j. For this reason, in this 
dissertation, we do not intend to give a detailed review of all these differ- 
ent versions of the EnSRF. Instead, we just choose the ensemble transform 




(3.29b) 





(3.29c) 
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Kalman filter (ETKF) proposed in [13] as the representative. In the ETKF 
[13j, one chooses 

T, = Er(Dr + I„.)"'^', (3.30) 



where is an mf^m'^j^ matrix. The matrices E^^^ and D^^** are constructed 
in the following way. Let p^^'^ be an mf^mf weighted projection covariance, 
so that 

^wpr ^ ^gfe^ HfcR^ ^HfcS^ ^ (Sfc) Rfc ^S^, (3.31) 

where Hfe is the Jacobian of the observation operator Tik evaluated at x^,. We 
perform a spectral decomposition on p^*"" so as to obtain a set of eigenvectors 
{eA;,j}^\ and the corresponding eigenvalues {rffc^jj^i • Then E^^^ is an mf x 
raf matrix consisting of the eigenvectors {ek,i\T=i so that 



Ewpr 
k 



(3.32) 



while D^^^ is an m^* x m^* diagonal matrix consisting of the corresponding 
eigenvalues so that D^^^ = diag((ifc^i, ^^^2, ■ ■ ■ ^dk^mf)- is shown 

in [13] that, if the observation operator is linear, then 



T,(T,)^ = I^.. - (S^)^ (S^ (S^)"^ + R.) ' St . (3.33) 

The details of the deduction are given in [isj^. Also, it can be shown that 

2 As a hint, one may put (S^)^R^'/' = ^^p' {p^P-y/^ ^ so that on the rhs of Eq. 
{SirmS\r + B.k)-'Sl = E^Dr (or +I™-)-^(Er)'^- AIso Note that I,„,. = 
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Tfc in Eq. fl3.30p is a non-singular solution to Eq. fl3.33p \5T; Thm. 6], in the 
sense that the product Tfe(Tfc)^ in Eq. (13331) . with given by Eq. (I330|) . 
is positive definite. 

Originally, the ETKF proposed in [13] lets = I^st in Eq. fl3.29cp . 
which may cause a bias in evaluation of the sample mean ic^, since there 
is no guarantee that the square root obtained in this way satisfies the 
constraint in Eq. (13.260 . Here we follow a revision of the ETKF proposed in 
[87] , and derive the matrix Uj. from the concept of spherical simplex unscented 
transform [^Tl HI] . Concretely, is chosen as an m^* x [ml^ + 1) matrix in 
the following form (cf. Eq. (C15) of [87]): 

1 1 
••• 

1 1 i „ 



^/^i^TT) ^%{% + 1) ^%{% + 1) 



^mf{mf + 1) \/mf{mf + 1) ^mf{mf + 1) 



(3.34) 

(D^'- + 1^,. ) (D7'' + " ' )^ ■ Then it can be verified that Tfe in Eq. (IH^Ol) 
is a sohition of Eq. (|3.33p . 
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which satisfies [57] 



ua = 0, 



(3.35a) 



(3.35b) 



where 1 in Eq. fl3.35ap denotes the (^^^ + 1) x 1 vector whose elements 
are all equivalent to 1. In this way, the square root in Eq. fl3.29cl) is 
an m X (m^^ + 1) matrix. In the case that there is no dynamical noise, 
= n — 1. Therefore, when generating the analysis ensemble according 
to Eq. (13.251) . one obtains m^^ + 1 = n members, the same as that of the 
background ensemble. But if there exists dynamical noise, one may have 
m|.* > n — 1, so that the number m^^ + 1 of the analysis ensemble members is 
larger than n. In this case, in order to prevent the number of the ensemble 
members from growing at each cycle, one may have to reduce the number of 
the column vectors of S^. One such a possible strategy will be discussed in 
§ 14.61 of the next chapter. 

For convenience, we summarize the procedures in the EnSRF as follows: 
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Propagation step: 



Filtering step: 



M 



k,k-l 



(4-1.) 



1 



}xb 



i=l 
1 



s^ = VSf (sf) +Qfc, 



^fc((S 



'^fc((Sfe)mf.''> 



(3.36a) 
(3.36b) 

(3.36c) 

(3.36d) 
(3.36e) 



x^ = x^ + K, (y^-T^fc (xt)) . 

-1/2 



E 



(3.37a) 
(3.37b) 
(3.37c) 
(3.37d) 



Analysis scheme: 



x^,, = x^ + v/^(Sa,^ = l,2, 



, n. 



For convenience, the dimension information of the matrices involved in the 



EnSRF is listed in Table EH 
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Table 3.1: Information of dimensions involved in the ETKF. 



Number 


Meaning 


m 


Dimension of the state space 


^obv 


Dimension of the observation space 


mf 


Number of positive eigenvalues of 


Sf {Sff + Q, 


n 


Number of the ensemble members of 




both the background and the analysis 


Matrix 


Dimension 


Qfe 


m X m 




X m"^ 




m X m°^^ 



Si X mf 

Tfc mf X mf 

mf X mf 
mf X mf 
mf X mf 

Ufe mf X {mf + 

mx {mf + 1) 



3.3.3 Two auxiliary techniques to improve the perfor- 
mance of the ensemble Kalman filter 

Two auxiliary techniques are often adopted in many applications of the EnKF 
in order to improve the performance of the filter. One technique is covariance 
inflation [H [671 IM] , which is related to the Kalman filter with fading memory 
(KF-FM) in § 12.3.21 as will be shown below. The other is covariance filtering 
(or covariance localization) [31]. 
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3.3.3.1 Covariance inflation 

Covariance inflation is a method that one artificially increases the error co- 
variance of the background or the analysis at each assimilation cycle [8], [671 
[55] . The rationale behind covariance inflation may be explained from the 
following points of views. On one hand, when adopting the EnKF for data 
assimilation, the error covariance of the system states (either the background 
or the analysis) will be systematically underestimated due to the effect of 
finite ensemble size [89]. Therefore, it is natural to introduce covariance 
inflation for compensation. On the other hand, one may note that for data 
assimilation in nonlinear systems, the EnKF is only an approximate solution. 
Therefore, even in the ideal situation where there are no other sources of er- 
rors in the systems under assimilation, there may be still an algorithmic error 
in the EnKF, which may cause offsets in our estimations. Therefore, one may 
follow the argument in § 12.3.21 to artificially increase the error covariance of 
the background by a factor, which in effect will increase the relative weight 
of the incoming observation, and thus improve the robustness and accuracy 
of the EnKF. 

In the EnKF, given a background ensemble = {x^^}"_^ with the 
sample mean x^, conducting covariance inflation is equivalent to replacing 
each ensemble member ^ by x^ + (1 + 5) (x^ ■ — x^) , where 6 is the inflation 
factor. In this way, the sample mean x^ of the background ensemble 
remains the same, but the sample error covariance is increased by a factor of 
(1 + 5)^. Given an analysis ensemble, covariance inflation can be done in a 
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similar way. 

How to optimally choose the value of the inflation factor S is still an open 
question. In many works, e.g. [Bl EZl EH], S was often heuristically chosen 
as a constant when running a data assimilation algorithm. In a more recent 
work, Anderson [7j proposed a spatially and temporally adaptive method to 
choose S, which treats 6 as the variable of a random process. Thus one can 
update 6 at each assimilation cycle according to a data assimilation algorithm 
(e.g. the EnKF) by treating 5 as a hidden (or unobserved) state variable of 
the dynamical system. However, one possible problem of this method is 
that normally one may not have the exact knowledge to specify the random 
process with respect to 6. 

3.3.3.2 Covariance filtering 

The error covariances of the EnKF are often evaluated based on small-size 
ensembles. For this reason, there may exist spuriously large correlations 
between distant locations in the practice [31j. To address this problem, one 
may introduce a distance-dependent filter to the error covariance, so that the 
correlations between two distant locations are set to zero. For this purpose, 
the Schur product can be applied to an error covariance. Mathematically, 
the Schur product, C = A o B, of two matrices A and B with the same 
dimensions, is defined as the matrix with the same dimension as A and B, 
whose components Cij = AijBij, where Aij, Bij and Cij are the components 
on the i-th row and the j-th column of the matrices A, B and C, respectively. 
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For our problem, we suppose that A is an error covariance matrix, and B 
is the matrix introduced to taper A so as to reduce the spuriously large 
correlations in A. Since A and the tapered matrix C = AoB are covariance 
matrices, they shall both be positive semi-definite. As a result, we also 
require that B be at least positive semi-definite [TOl Lemma 3.7.1]. 

For convenience, we call B the taper matrix hereafter. The construction 
of B can be done in the following way: 

B^J = p{d,j), (3.38) 

where dij is a metric measuring the difference between the locations i and 
j, and p is chosen to be a function of -positive type [69, p. 299], so that it 
guarantees that the taper matrix B is positive definite, as a result of the 
Bochner's theorem [69, p. 300]. Several examples of such a function p were 
discussed in [27j. In this dissertation, we follow ^34j and choose function p 
in the following form: 



_1^5 + i^3 + 5 3_ 5 2 ifO<z<l; 

4 2 8 3' - - > 

1^5_i^4^5 5 2 ifi<;,<2; (3.39) 

12 2 8 3 3' 

, if 2 > 2 . 



For illustration, the shape of the function p is plotted in Fig. (13.11) . As one 
can see there, the function p has a "cut-off" effect at z = 2, in the sense that 
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Figure 3.1: Shape of the function p in Eq. (13.391) . 

the values of the function are set to zero for all z > 2. 

For data assimilation in real world, dij is normally a function of the dis- 
tance between the available observation sites i and j in the three dimensional 
physical world. For example, see [IB]. However, in mathematical analysis, 
this choice might not always be available. For example, the system states of a 
mathematical model may not have any physical meaning, so that we cannot 
observe them in the physical world. On the other hand, for a mathematical 
model, the (physical) distance between indices (or locations) i and j may 
also not be well defined as in the physical world. For these reasons, we follow 
to conduct covariance filtering in the following way. 
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Concretely, we suppose that A is the covariance of an m-dimensional 
random variable x = [xi, ■ ■ ■ ,Xm]- For convenience, we assume that the 
mean E(x) = 0, so that 

A= [rf,---,ra^, (3.40) 

where 

Ti = [E{XiXi), ■ ■ ■ ,E{XiXm)] (3.41) 

is the i-th row of A. We define 

d,, = \\rj -rjyic, (3.42) 

where Ic is a length scale that is introduced to influence where the "cut-off" 
effect of the function p takes place [fj. With some algebra, it can be shown 
that 

dij = y ^ (E((X, - Xj)Xi)f + ■■■ + (E{{Xi - Xj)Xm)f ■ (3.43) 

In this sense, dij is a metric in measuring the statistical difference between the 
random variable Xi and Xj in the m-dimensional state space. In particular, if 

■^Note that in Eq. (|3.42|) . by choosing the row vectors to calculate the distances dij, we 
have implicitly assumed that the number of the rows of a matrix (not necessarily square) 
is larger than or at least equal to the number of its columns. If this is not the case, then 
it is suggested to choose the column vectors to calculate the distances dij instead. In this 
way, covariance filtering can be applied to non-square matrices like the cross covariance 
(when the dimension of the state space is not equal to that of the observation space, cf., 
for example, Eq. p.lQbl) ). 
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the index i = j or if the length scale Ic = oo so that dij = 0, then p (dij) = 1, 
which implies that in effect there is no tapering effect. But if i ^ j so that 
Xj and two different random variables, d^ is positive and covariance 

filtering will take place. 

Here we use a numerical example to illustrate the effect of covariance fil- 
tering in changing the structure of a covariance matrix. To this end, we draw 
10 samples from the 40-dimensional normal Gaussian distribution A^(0, 140), 
with I40 being the 40-dimensional identity matrix. In consistence with the 
previous notations, we denote the covariance matrix calculated based on 



these 10 samples by A. In Fig. 13.21 we use the interpolated contour map 
to represent the structure of A, where the values of the contour levels in 
Fig. 13.21 correspond to the values of the elements of A. For visualisation, we 
use different colours to represent different values, as indicated in Fig. 13.21 As 
one can see, because of the effect of small samples, the sample covariance 
matrix A deviates from the 40-dimensional identity matrix I40. In fact, by 

it can be seen that A is singular 



checking the eigenvalues of A in Fig. [3l 



in the sense that it only has 9 positive eigenvalues. 

The taper matrix B is constructed based on Eqs. fl3.38p . fl3.39p . fl3.40p . 
and f l3.42p . where the length scale in Eq. (13. 420 is chosen as /c = 5. We plot 
the contour map of the taper matrix in Fig. 13.31 As one can see, the main 
diagonal of B consists of values of 1, where the other elements of B are in 



^The eigenvalues are obtained by conducting a singular value decomposition on A. The 
eigenvalues for the other matrices in Fig. 13.51 are obtained in the same way. 
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Row index 



Figure 3.2: Contour map of the sample covariance matrix before conducting 
covariance filtering. 

general less than 1. On the other hand, we also plot the eigenvalues of B 
in Fig. 13.51 Numerical experiments show that the eigenvalues of the taper 
matrix B are always positivq^l, which confirms that B is a positive definite 
matrix, as we expect. 

After conducting covariance filtering, we obtain the tapered sample co- 



variance C = AoB. The contour map of C is plotted in Fig. 



Comparing 



the structure of A in Fig. 13. 21 with that of C in Fig. 13.41 it can be seen that A 



^Some small eigenvalues of B are very close to zero, so it may not be distinguishable 
in Fig. [231 

^Note that in Fig. 13.41 some of the negative values are very close to (in the order of 
10~^-10~^). The negativeness of these elements is not well represented by the colour bar 
because of the scales of the values in plot. 
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Row index 



Figure 3.3: Contour map of the taper matrix. 

and C have the same main diagonal elements. While the other elements of C 
are in general closer to than those of A. In summary, conducting covariance 
filtering "decreases the off-diagonal elements (of a covariance matrix), while 
keeping the (main) diagonal elements unchanged" , having the same effect as 
that reported in [18]. In this way, the spuriously large covariances between 
two different random variables may be reduced. In addition, as indicated in 
Fig. 13.51 after conducting covariance filtering, the tapered sample covariance 
C becomes positive definite, since all its eigenvalues are all positivq^. 

In the context of the EnKF, covariance filtering is normally conducted 

gain, some small eigenvalues of C are very close to zero, so it may not be distin- 
guishable in Fig. 13.51 
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Row index 

Figure 3.4: Contour map of the sample covariance matrix after conducting 
covariance filtering. 

at the propagation step. So at the k-th assimilation cycle, there are three 
matrices, namely, the background covariance P^, the cross covariance P^'', 
and the projection covariance that possibly need to conduct covariance 
filtering. The readers are referred to [18] for the implementation of covariance 
filtering in practice For the reasons given previously, we do not use the 
observation sites in the physical world to construct the taper matrix in our 
analysis. Our implementation strategy is discussed below. 

First, we note that in the EnKF (including both the stochastic EnKF and 

^Note that in the practical implementation, it often assumes that the observation sys- 
tem is linear, which is not a necessary assumption for our implementation 
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Figure 3.5: Eigenvalues of the matrices involved in conducting covariance 
filtering. 

the EnSRF), it is not necessary to calculate the background covariance P^, 
because the subsequent procedures, such as the calculation of the Kalman 
gain, the updating of the background, and the generation of the analysis 
ensemble, do not involve P^. Therefore it is not necessary to conduct covari- 
ance filtering on P^. On the other hand, covariance filtering can be conducted 
on both the cross covariance P^*" and the projection covariance P|^. In the 
EnKF, this can be done either through Eq. fl3.42p so that the construction of 
the taper matrix is based on the matrix to be tapered, or through Eq. (13.431) 
so that the construction of the taper matrix is based on the ensemble mem- 



77 



bers (but in Eq. fl3.43l) the expectation shall be replaced by sample mean). 
Which way to choose may depend on our practical consideration, i.e., which 
one is more efficient in a certain sense. With a moderate dimension of the 
dynamical system in our numerical experiments (cf. Eqs. fl3.44p and fl3.45p ). 
we choose to construct the taper matrix through Eq. (13.421) since it is easier 
to implement the codes (in MATLAB) and runs faster (by using matrices 
rather than loops). 

Also note is that, by conducting covariance ffitering, we introduce an 
extra parameter, the length scale Ic (cf. Eq. (13.421) or (13.431) ), to the EnKF. 
Like the situation in choosing the optimal inflation factor, there still lacks 
a systematic approach to determining the optimal length scale Ic- Thus in 
practice one has to adopt some heuristic methods instead. 

3.4 Example: Assimilating a 40-dimensional 
system 

3.4.1 The dynamical and observation systems 

For illustration, we choose the m-dimensional system model introduced by 
Lorenz and Emanuel [52l [53] (Lorenz-Emanuel 98, or LE 98 for short here- 
after) for our numerical experiments. The LE 98 model is a simplified system 
for modelling atmospheric dynamics, which "shares certain properties with 
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many atmospheric models" ^3]. Its governing equations are given by 
dx ' 

= {xi+i - Xj_2) Xi^i - Xi + F, i = 1, - ■ ■ ,m. (3.44) 

at 

The quadratic terms simulate the advection, the hnear term represents the 
internal dissipation, while the constant F acts as the external forcing [52] . 
For consistency, the variables Xj's are defined cyclically such that x_i = Xm-i, 
Xo = Xm, and Xm+i = Xi. Note that in Eq. (13.441) . there is no dynamical noise. 
One may choose to add some artificial noise to the dynamical system so as to 
improve the performance of a data assimilation algorithm. For example, see 
[9]. In doing this, in effect one increases the background covariance, similar 
to the idea of covariance inflation. Since we have adopted covariance inflation 
in our implementation, here we choose not to introduce any artiflcial noise to 
the dynamical system Eq. (13.441) . so that in effect we let = 0. In this way, 
the implementation of the ETKF can be simpler, since it is more convenient 
to obtain the square roots of the background covariances when there is no 
dynamical noise (cf. the discussion in § I3.3.2p . 

We choose the observer Tik to be a time-invariant identity operator. 
Speciflcally, given a system state = [xk^i, ■ ■ ■ , Xk^mV ^^e k-th assimila- 
tion cycle, the observations are obtained according to 

Yfc = Hfc(xfc) + Vfc = Xfc Vfc, (3.45) 

where follows the m-dimensional Gaussian distribution A^(vfc : 0, 1) with 
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I being the identity matrix. 

Note that Eq. (13.441) represents a set of nonhnear ordinary differential 
equations (ODE). Their exact solutions are intractable. Therefore it is cus- 
tomary to integrate the nonlinear ODEs numerically in practice. This in- 
troduces a discretization to the dynamical system, so that it becomes a 
first-order Markov chain, in the form of Eq. (13. lap . In doing this, the data 
assimilation problem falls into the scenario presented in § 13.21 (by ignoring 
the discretization errors in the dynamical system). 

In our experiments, we set m = 40 and F = 8, and integrate the dy- 
namical system Eq. (I3.44p through a fourth-order Runge-Kutta method [HI 
Ch. 16]. We choose the length of the integration window to be 100 dimen- 
sionless units, and the integration time step to be 0.05 units (corresponding 
to a 6-h interval in reality [53]). Thus there are 2000 assimilation cycles over- 
all. We make the observations of the dynamical system at each assimilation 
cycle. 

3.4.2 Two Measures of filter performance 

We adopt two statistics to measure the performance of the EnKF. One is the 
time-averaged relative (or normalized) rms error {relative rmse for short), 
which is defined as 




k=l 



(3.46) 
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where kmax is the maximum assimilation cycle, x^'' denotes the truth (the 
state of a control run) at the k-th cycle, and ||»||2 means the 2-norm. Note 
that Cr can be interpreted as the time-averaged noise level of the trajectory 
{x^lfc™!"" with respect to the true states. From this point of view, we can 
define the concept of divergence of the EnKF in the following sense: suppose 
that the relative rmse of the observations is ef''", which, with the identity 
observation operator Hk, is defined as 

= 7— E lly^^' - 4'-||2/||xri|2 = 7— E Il^^ll2/l|xri|2 . (3.47) 

If Cr > 6°'^'", then we say the EnKF is divergent because in such circumstances, 
the trajectory {x^}^!!!|"' obtained by the EnKF, on average, is more noisy than 
the observations, which implies that it might not make any sense to use the 
EnKF for data assimilation. 

Note that one may also use the time-averaged absolute rmse 



k 



— Ell^^-^^nh (3.48) 



as the measure. Since we deal with the estimation errors in a finite-dimensional 
space, it can be shown that the norms ||«||2 and ||«||2/c for any positive scalar 
constant c are topologically equivalent [351 Thm 5.36]. However, the relative 
rmse appears to be a more straightforward measure for indicating how good 
(or bad) our estimations are with respect to the true states. For this reason. 
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we adopt the relative rmse throughout this dissertation. 

Another statistic is the time-averaged rms ratio {rms ratio for short), 
which is designed to examine the similarity between the analysis ensembles 
generated by the EnKF and the true states. To see this, we first introduce 
two types of errors with respect to an analysis ensemble = {x^ in 
terms of 

efc,i = ||xfc-x*'-||2, 

1 " (3.49) 

i=l 

Here, e^^i denotes the error of the sample mean in estimating the truth 
x^*", where ek^2 means the average error of the ensemble in estimating x^*^. 
The time averaged rms ratio R is defined as 

r""'-""^"^ . (3^50) 

fc=l ^^=1 EI|Xfc,i-X^^||2 

i=l 

If the truth x^*" is statistically indistinguishable from the analysis ensemble 
X^, then it can be shown that the expectation of the ratio 6^,1/6^,2 is given 
by [6IIE9] 

R, ^ (3.51) 

Therefore, we say that the analysis ensemble X^ is statistically indistinguish- 
able from the truth x^^ if R and R^ are close to one another. The relative 
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position between R and Re also qualitatively reflects the performance in es- 
timating the error covariance, e.g., overestimation or underestimation (cf. 
[H], ISU] and the references therein). R > R^, means that the covariance com- 
puted based on the ensemble = underestimates the estimation 
error, while R < Re implies the opposite, i.e., overestimation of the estima- 
tion error [6ll [89] . 

3.4.3 Additional information of numerical implemen- 
tations of the algorithms 

The nonlinear Kalman filters, including the EnKF in this chapter, and the 
reduced rank sigma point Kalman filters in chapters H] and O may require 
the computations of the eigenvalues and eigenvectors, or the square roots, 
or the inverses of some matrices. Thus for clarity, here we would like to 
explain how we conduct these computations in our experiments. Because of 
the same origin of the nonlinear Kalman filters, the computation schemes 
discussed here are applied in the same way to the EnKF in this chapter, and 
the reduced rank sigma point Kalman filters in the next two chapters. 

Firstly, for evaluations of the eigenvectors and eigenvalues of a symmetric 
matrix, we adopt the spectral decomposition in our computation by treating 
it as a special case of the singular value decomposition (SVD) [29l § 2.5.3]. 

Next, for computations of the square roots of some covariance matrices, 
we note that these matrices are all positive semi-definite (cf. Eqs. fl3.36p . 
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dSSD, dSHD, (I53HD, and ([53ZD)- Therefore, one may in general follow the 
scheme in § 12.3.11 and use the spectral decompositions to compute the square 
roots. 

Finally, we note that the inverse of the matrix (S^)^+Rfc (or P^^+R^) 
is involved when computing the Kalman gain in the nonlinear Kalman filters 
(cf. Eqs. ( Km . (Km . flOHD . and (^M))- The matrix (S^)^ + (or 
P^*^ + Rfc) is normally positive definite, since this is often the case for the 
covariance matrix R^, of observation noise. For this reason, we also use the 
spectral decomposition to compute the inverse of (S^)"^+Rfc (or P^^+R^). 
This is based on the following fact: if C is a symmetric, positive definite 
matrix, so that by spectral decomposition we have C = E'^D'^(E'^)"^, where 
E'^ is the matrix consisting of the eigenvectors of C, and D*^ is a diagonal 
matrix consisting of the corresponding positive eigenvalues of C, then we 
have the inverse C^^ = E^(D^)-^(E'=)^ [29l § 5.5.4]. 

3.4.4 Numerical results 

Now we examine the performances of the stochastic EnKF and the ETKF 
through some numerical experiments. 

3.4.4.1 Effects of the inflation factor 5 and the length scale Ic on 
the performances of the filters 

In this experiment, we aim to examine the relative rms errors and rms ratios 
of the stochastic EnKF and the ETKF as functions of the covariance inflation 
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factor S and covariance filtering length scale Ic- To this end, we let the 
ensemble size n = 10, which represents the typical situation in ensemble 
forecasting, where the ensemble size is normally lower than the dimension 
of the dynamical system. Since there is no systematic method to select the 
optimal values of 6 and Ic, we choose to examine certain ranges of these two 
parameters. For S, we let its value start from and increase by 0.5 each run 
until its value reaches 10. For convenience, we adopt the notation : 0.5 : 10 
to denote this setting. For 1^, we let it increase from 10 to 400, with a fixed 
increment of 20 each run. This setting is accordingly denoted by 10 : 20 : 400. 
Similar notations will be frequently adopted in subsequent chapters. Since in 
this and the subsequent chapters, the numerical experiments involve intense 
computations at different values of various intrinsic filter parameters, we 
choose to run the experiment once for each set of the filter parameters due 
to the limitation of computational resourceq^. 

First, we plot the relative rms errors of the stochastic EnKF and the 
ETKF in Figs. 13.61 and 13.71 respectively. As one can see there, covariance 
inflation and filtering can both improve the performances of the filters given 
suitable values of 6 and Ic- 

Indeed, in Figs. 13.61 and 13.71 within the ranges of the parameters tested, 
when fixing 6, if 6 is not too large (say 5 < 3), a smaller length scale Ic 

^In practice, the typical scenario is that one has fixed observations and the freedom to 
choose the background ensemble. Since the ETKF is deterministic, the randomness only 
lies in the choice of the initial background ensemble, whose effect will be diluted as time 
moves on, especially with the covariance inflation technique. Similar arguments can be 
applied to the nonlinear Kalman filters to be introduced in Chapters [4] and El 
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Figure 3.6: The relative rmse of the stochastic EnKF as a function of the 
inflation factor 6 and the length scale Ic- 

tends to yield lower relative rms errors for both the stochastic EnKF and the 
ETKF. In fact, the lowest relative rms errors of both the stochastic EnKF 
and the ETKF are achieved with 1^ < 110. 

On the other hand, when fixing Ic, the relative rmse of the stochastic 
EnKF exhibits a U-turn behaviour (in terms of the relative rmse) as 6 in- 
creases: when 6 increases from 0, the relative rmse tends to decrease. But 
after 6 becomes larger than a certain value, further increasing 6 will in- 
stead cause a larger relative rmse. Following [58], we explain the U-turn 
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Figure 3.7: The relative rmse of the ETKF as a function of the inflation 
factor 6 and the length scale Ic- 

phenomenon as follows: when there is no covariance inflation, i.e., 6 = 0, 
it can be shown that the error covariance of the EnKF is systematically 
underestimated [89]. This implies that we are over-confident about the back- 
ground. Consequently, the analysis to be updated will rely too much on 
the background, which may cause a relatively large relative rmse, since the 
information content from the incoming observation will possibly be under- 
represented. On the other hand, increasing 6 will make the error covariance 
of the background become larger. This implies that we are more uncertain 
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about the background. Thus if S gets too large, the analysis to be updated 
will rely too much on the incoming observation, which may also cause a 
relatively large relative rmse, since the information content from our prior 
knowledge (the background) will possibly be underrepresented. In contrast, 
a moderate inflation factor 6, as a trade-off between being too large and too 
small, will instead get a lower relative rmse. For the ETKF, one can also find 
the U-turn behaviour in Fig. 13.71 which can be explained in a similar way. 

A comparison between Figs. 13 . 6 l and 13 . 71 reveals that, given the same 6 and 
Ic, the relative rmse of the ETKF is always lower than that of the stochastic 
EnKF. This is consistent with the result reported in [89]. In fact, the relative 
rmse (i.e., noise level) of the observations in our experiment is around 0.22. 
Thus from Fig. 13.61 one can see that the stochastic EnKF is always divergent 
within the ranges of the parameters tested, since its relative rmse is larger 
than 0.22 everywhere. In contrast, from Fig. 13.71 one can see that there exist 
some areas, for example, the one surrounded by the the horizontal axis and 
the contour level curve marked by the value of 0.2, where the ETKF is not 
divergent in the sense that its relative rmse is no larger than 0.2. 

Next, we plot the rms ratios of the stochastic EnKF and the ETKF in 
Figs. 13.81 and 13.91 respectively. From them, one can see that, when fixing Ic, 
the rms ratio is a monotonically decreasing function of 6, while when fixing 
6, the rmse ratio is roughly a monotonically increasing function of Ic, with 
some violations in the ETKF. For example, if we fix 5 = 5 in Fig. 13.91 the rms 
ratio will decrease as Ic increases from 10 to 50, and then start to increase 
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Figure 3.8: The rms ratio of the stochastic EnKF as a function of the 
inflation factor S and the length scale Ic- 

after 1^ is over 50. Since the ensemble size n = 10, we have the expectation 
of the rms ratio ~ 0.74 according to Eq. fl3.5ip . Therefore, if the analysis 
ensemble is statistically indistinguishable from the truth, the rms ratio R 
should be close to Re ~ 0.74. In this sense, both the stochastic EnKF and 
the ETKF can generate analysis ensembles that are indistinguishable from 
the corresponding truths, provided that 6 and Ic are taken within the strips 
between the ratio values of 0.7 and 0.8 such that R ~ 0.74. However, in 
order to obtain better performances in terms of the relative rms errors, both 
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Figure 3.9: The rms ratio of the ETKF as a function of the inflation factor 
6 and the length scale Ic- 

the stochastic EnKF and the ETKF should take 6 and Ic outside of the 
aforementioned strips, so that the corresponding rms ratios are lower than 
the expectation 0.74. According to the discussion in § 13.4.21 (also cf. |6l [89]). 
this implies that the error covariances of the analysis ensembles are over- 
estimated. Thus these experiment results confirm the benefit of conducting 
covariance inflation. 
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Figure 3.10: The relative rms errors of the stochastic EnKF and the ETKF 
as functions of the ensemble size n. 

3.4.4.2 Effect of the ensemble size on the performances of the 
filters 

Now we examine the relative rms errors and rms ratios of the stochastic 
EnKF and the ETKF as functions of the ensemble size n. To this end, we 
let 5 = 3 and Ic = 50 for both the stochastic EnKF and the ETKF. The 
ensemble size n increases from 5 to 40 with a fixed increment of 1 each run. 
This setting is denoted by 5 : 1 : 40. 

We plot the relative rms errors of the stochastic EnKF and the ETKF in 
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Fig. I3.1U[ For the stochastic EnKF, the relative rmse drops rapidly when n 
increases from 5 to 10. After that, further increasing n does not reduce the 
relative rmse significantly. For the ETKF, the situation is slightly different. 
The relative rmse also drops rapidly when n starts from n = 5. However, as 
n increases, the relative rmse exhibits the U-turn behaviour, with the lowest 
relative rmse attained at n = 13. For n > 13, the ETKF with a smaller 
ensemble size (say n = 20) performs better than the ETKF with a larger 
one (say n = 40). A possible explanation of this phenomenon is postponed 
to § 14.7.3.31 in the next chapter, since we feel this phenomenon is better 
explained there. 

Comparing the curves in Fig. 13.101 one can see that, with the same ensem- 
ble size n, the ETKF always outperforms the stochastic EnKF. The stochas- 
tic EnKF is divergent for all the ensemble sizes tested, in the sense that the 
corresponding rms errors are always larger than 0.22 (the relative rmse in 
the observations). In contrast, the ETKF is not-divergent for n >9. 

We also plot the time-averaged rms ratios of the stochastic EnKF and 
the ETKF in Fig. 13.111 For both the filters, their rms ratios decrease mono- 
tonically as the ensemble sizes n increase. To make the rms ratios close to 
0.74, the ensemble sizes of both the filters should be less than 8. Increasing 
n will lead to smaller rms ratios and thus cause over-estimations of the error 
covariances. This, however, will benefit the performances of the filters in the 
sense that the corresponding relative rms errors are lower. 
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Figure 3.11: The rms ratios of the stochastic EnKF and the ETKF as 
functions of the ensemble size n. 

3.5 Summary of the chapter 

In this chapter we considered the data assimilation problem in nonlinear 
(discrete) systems, where both the dynamical and observation noise are as- 
sumed to follow some Gaussian distributions. We used recursive Bayesian 
estimation (RBE) to derive an approximate Monte Carlo solution to the data 
assimilation problem, which turned out to be consistent with the ensemble 
Kalman filter (EnKF) in the literature. We noted that for the validity of the 
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deduction, apart from the Gaussianity assumptions for both the dynamical 
and observation noise, it is also necessary to assume that the system states 
also follow some Gaussian distributions. 

Depending on whether to perturb the observations or not, the EnKF can 
be classified as two types: the stochastic EnKF and the ensemble square 
root filter (EnSRF). Our numerical results showed that the ensemble trans- 
form Kalman filter (ETKF), as a representative of the EnSRFs, consistently 
outperformed the stochastic EnKF. 

In order to improve the performance of the EnKF, we introduced two 
auxiliary techniques, namely covariance inflation and flltering. We also dis- 
cussed the connection between the covariance inflation technique and the 
Kalman filter with fading memory. Through some numerical experiments, 
we illustrated the benefits of adopting these two auxiliary techniques in the 
EnKF. 
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Chapter 4 

Unscented and scaled 
unscented Kalman filters for 
data assimilation 

4.1 Overview 

The conventional Kalman filter (KF) is simple but general for linear/Gaussian 
systems. However, when it comes to nonlinear or non-Gaussian systems, its 
optimality is often lost. Since the appearance of the KF, lots of works were 
dedicated to extending the KF to nonlinear and/or non-Gaussian systems. 
In fact, apart from the ensemble Kalman filter (EnKF) introduced in the 
previous chapter, there are some other types of extensions, for example, the 
extended Kalman filter (EKF) and its variants, the iterated EKF and higher 
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order EKF [7Sl ch. 13], the unscented Kalman filter (UKF) [10] and its gen- 
eralization, the scaled unscented Kalman filter (SUKF) ^44^, and the divided 
difference filters (DDEs) [2ZIE11ES]- All these filters are intimately related to 
the conventional KF. For this reason, we call them nonlinear Kalman filters 
in this dissertation. Note that these nonlinear Kalman filters are not de- 
signed for the data assimilation problem in nonlinear/non-Gaussian systems. 
Instead, they yield approximate solutions for nonlinear/ Gaussian systems. 
Here by "nonlinear/Gaussian", we mean that not only are the dynamical 
and observation noise Gaussian, but also the underlying system states, as we 
have pointed out in § 13. 2[ 

Similar to the derivation of the EnKF in § 13.21 one may also derive other 
nonlinear KFs from the point of view of recursive Bayesian estimation (RBE). 
As a result, one can also split the procedures of a nonlinear KF into the 
propagation (or prediction) step and the filtering step. For all of the nonlinear 
KFs, the main operations at the filtering step are the same, which update the 
mean and covariance of the background to the corresponding statistics of the 
analysis, in the same way as the conventional Kalman filter does. Thus in 
general, it is the approach to dealing with the nonlinearity at the propagation 
step that distinguishes different types of nonlinear KFs. 

Note that under the assumption of Gaussianity, in order to estimate the 
pdf of a Gaussian distribution, it is sufficient to estimate its mean and covari- 
ance. Thus for a nonlinear Kalman filter, the pdf approximation problem at 
the propagation step Eq. (]1.3bp of RBE is equivalent to the problem in esti- 
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Figure 4.1: The recast problem at the propagation step of a nonhnear KF. 
mating the mean and covariance of the background, which itself can be recast 



as the estimation problem in the following scenario: as shown in Fig. 14. ![ we 
suppose that there is a Gaussian random variable x with mean x and co- 



variance Px, which is transformed 
random variable 77, so that 77 = JF (x^ 



w a nonlinear function JF into another 
J. Our objective is to estimate the mean 



f] and covariance of the transformed random variable rj. 

To solve the recast problem, the idea of the EnKF, as introduced in the 
previous chapter, is to generate some samples of system states and propa- 
gate them forward. Then the mean f] and covariance of the transformed 
random variable 77 are estimated as the sample mean and sample covariance 
of the propagations. 

Apart from the EnKF, there are a few other methods to tackle the recast 
problem. One such method is the (first order) extended Kalman filter (EKF) 
J. The idea of the EKF is to expand the nonlinear function JF around the 
mean x up to first order in a Taylor series expansion. For example, let 



"'^The more general scenario, where 77 = J-'(x, u), with u being Gaussian noise, is dis- 
cussed in Appendix [Cl 

^For convenience, hereafter whenever we say extended Kalman filter, we mean the first 
order approximation by default. 
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X = X + (5x, where (5x represents a small perturbation, then 



(x + 5x) = (x) + F|x (5x + o ((5x) , 



(4.1) 



where F|x denotes the Jacobian matrix of T evaluated at x, and o ((5x) rep- 
resents the higher order terms in the expansion. Thus if 5x is sufficiently 
small, or if the system under assimilation is weakly nonlinear, so that higher 
order derivatives of the function T are relatively small compared with the 
Jacobian F|x, then o (5x) can be neglected in computation. To this end, let 
5?7 = JF (x + (5x) — T (x) , then the nonlinear system in Eq. fl4.ll) is approxi- 
mated by a linear one 



by neglecting o(5x). Thus the conventional Kalman filter introduced in 
Chapter [2] can be used to assimilate the approximate system Eq. (14.21) . 

In order to implement the EKF (or its higher order variants, see [73t 
Ch. 13]), one has to evaluate the derivative(s), e.g., Jacobian or even Hessian, 
of the nonlinear function JF, which is often inconvenient in implementation. 
For this reason, we will not investigate the performance of the EKF in this 
dissertation [f]. Instead, we will introduce two other types of nonlinear KFs 
developed in recent years, namely the unscented Kalman filter (UKF) [IQ] 

•^Another reason is that, it can been shown analyticaUy that the unscented Kalman 
filter to be introduced later systematically outperforms the EKF [40 . The higher order 
variants of the EKF may perform better than the EKF itself. However, the complication 
in evaluating higher order derivatives prevents their spread in practice. 



bl] ^ F|x 



(5x 



(4.2) 
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Table 4.1: Different ways of tlie nonlinear Kalman filters in dealing with the 
nonlinearity at the propagation step. 



Filter 


Idea 


Extended Kalman filter 


Linearizing the nonlinear function T 


Ensemble Kalman filter 


Taking average over the propagations 
of ensemble members (see Ch. [3]) 


Unscented Kalman filter; Scaled un- 
scented Kalman filter 


Taking weighted average over the prop- 
agations of sigma points (similar to the 
ensemble Kalman filter) 


Divided difference filters 


Interpolating the nonlinear function 
T by Stirling's interpolation formula 
(similar to the extended Kalman filter) 



and its generalization, the scaled unscented Kalman filter (SUKF) [44j, and 
the divided difference filters (DDEs) [371 EH EH], for the data assimilation 
problem in nonlinear/Gaussian systems. One advantage of these filters is that 
they avoid the necessity of evaluating the derivatives of a nonlinear function. 
Instead, they all produce some specially chosen system states, called sigma 
points, for the purpose of approximation. Eor this reason, they are uniformly 
called sigma point Kalman filters (SPKEs) [82] or derivative-free filters [72] . 

More details of the UKE, the SUKE and the DDEs will be presented in 
this and the next chapters. As a summary, we provide in Table 14.11 brief 
descriptions of how some of the nonlinear KEs handle the nonlinearity at the 
propagation step. 

The remainder of this chapter is organized as follows: although the prob- 
lem in study is the same as that in § 13.21 we choose to re-state it in § 14.21 for 
completeness. In § 14. 3[ we introduce the unscented transform (UT) as the 
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approximate solution to the recast problem in Fig. 14.11 We then proceed to 
introduce the scaled unscented transform (SUT) in § 14.41 as the generalization 
of the UT. Applying the SUT to the propagation step of RBE leads to the 
scaled unscented Kalman filter (SUKF), as will be seen in § 14.51 To apply 
the SUKF to assimilate high dimensional systems, we introduce the reduced 
rank SUKF in § 14.61 In § 14.71 we use the 40-dimensional Lorenz-Emanuel 
system as the testbed to illustrate the details in implementing the reduced 
rank SUKF, and to investigate the effects of the intrinsic filter parameters 
on the performance of the SUKF. We draw our conclusion for this chapter 
in§|iSl 
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4.2 Problem statement 



We are interested in the data assimilation problem in the same family of 
nonlinear/ Gaussian systems as described in Eq. (13.11) . i.e., 



Xfc = Mk,k-1 (Xfc_l) + Ufc , 


(|3.1a|) 


Yk = Hk (xfc) + Vfc , 


f|3.1bj) 


Ufc ~ (ufc : 0, Qfc) , 


(|3.1c|) 


Wk^N (vfc : 0, Rfc) , 


f|3.1dj) 


E (ujUfc ) = 4,iQfc , 


(|3.1e|) 




(|3.1ft) 


E(uivj)=0 V^,j. 


(3.1g) 



The approximate solutions to the above problem, in terms of the sigma 
point Kalman filters (SPKFs), are given in this chapter (for the UKF and 
the SUKF), and the next chapter (for the DDEs), respectively. As we have 
pointed out previously, the nonlinear KEs differ from each other mainly at the 
propagation step, where the problem can be recast as the estimation problem 
in Eig. 14. 1[ Therefore, in this chapter, we first discuss how to solve the recast 
problem through the unscented transform (UT) and its generalization, the 
scaled unscented transform (SUT). Incorporating the UT and the SUT into 
the propagation step of RBE leads to the unscented Kalman filter (UKE) 
and the scaled unscented Kalman filter (SUKE), respectively. 
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4.3 Unscented transform 
4.3.1 Basic idea 

The idea of the unscented transform (UT) is based on the intuition that "it 
is easier to approximate a probabihty distribution than it is to approximate 
an arbitrary nonhnear function or transformation" [ID]. To see this, we use 
a continuous nonhnear transform 77 = JF (x) for iUustration. Suppose that x 
is an m-dimensional random variable (not necessarily Gaussian), with mean 
X and covariance P^:, and that S^; is an m x L square root of P^. such that 
Px = (Sx)'^ . We generate a set of 2L + 1 specially chosen states {Xi}'f^Q, 
called sigma points, with respect to the triplet (A,x, S^,), according to the 
following formula: 

Xq = X, 

A'i = x + yLTA(S,),, z = 1,2,--- ,L, (4.4) 
Xi = ^- VL + X (S^),_^ , z = L + 1, L + 2, ■ ■ ■ , 2L, 

where (S^:)^ denotes the i-th column of the square root matrix S^^, and A 
is an adjustable parameter satisfying some constraints (see § I4.7.2p . The 
reason to introduce A is that it can influence higher order moments (e.g., 
kurtosis) of the set Thus one may use this flexibihty to reduce the 

approximation error of sigma points in higher order moment matching ^0] , 
as will be shown later. 
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We also allocate a set of weights {VFjj^^Q 
Wo ^ 



to sigma points {A'ijf^o- ^^is way, it can be verified that the weighted 
sample mean and sample covariance of the set {A'j}|^Q, given by 

2L 



=0 
2L 

Wi [x, - [x, - X 



2L 

Px = 

i=0 



'=° (4.6) 



match the mean x and covariance P^. of the random variable x, respectively. 
If X follows a Gaussian distribution, then it is suggested that A be chosen as 
A = 3 — L, so that the kurtoses of {Xi\fIlQ can match as many as possible of 
those of the random variable x |^ HI] . 

Because of the symmetry in sigma points, the rank of the matrix Pa- is 
L. To avoid rank deficiency in Px^ it is suggested that the number of sigma 
points be larger than twice the dimension of the vector x, or equivalently, 
L > m [lO]. However, for high dimensional systems, this restriction should 
be relaxed in order to reduce the computational cost, as will be discussed 
later. 

2L 

The weights {M/jj^^o satisfy the normalization condition ^ Wj = 1. How- 

i=0 

ever, they might be inconsistent with the conventional interpretation of the 
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weights of samples of a distribution. This is because A can be negative, so 
that Wo is also negative. The negativeness of Wq also causes another prob- 
lem, in that the sample covariance P.^ (cf. Eq. (I4.7bl) ) of the transformed 
sigma points may not always be positive semi-definite. 

One remedy to the above problem is to simply let A always be non- 
negative. In particular, by letting A = such that Wq = 0, one in effect 
propagates (or transforms) sigma points {A'jjf^o forward except for the center 
Xq. This scheme, i.e., excluding the center of sigma points, is known as 
positive-negative pairs (PNP) in the literature of data assimilation (cf. [SZ] 
and the reference therein). In this sense, the PNP scheme can be deemed as 
a special case of the UT. In both the UT and PNP schemes, sigma points, 
with or without the center, will have the same sample mean X and sample 
covariance P;^, regardless of the choice of A. The advantage of adopting the 
UT, however, lies in the fact that, in the UT there is an additional parameter 
A, which provides an extra freedom to influence higher order moments (e.g. 
kurtosis) of sigma points [lOl Appendix II]. Moreover, from Eq. (14. 4p one 
can also see that A also affects the distances of the other sigma points to the 
center (but without affecting their sample mean and sample covariance). 
This may be desirable in some situations, where one wants to explore the 
dynamics of a nonlinear transform in different scales through an ensemble of 
system states, but does not wish to change the sample mean and covariance 
of the ensemble. We will come back to the issue of positive semi-definiteness 
later and present another remedy following the work ^44j . 
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To estimate the mean and covariance of the transformed random variable 
1], we first conduct the nonhnear transform on the set of sigma points {A^jf^Q 
so as to obtain a set of transformed sigma points {3^j : = {Xi)}f^Q. As 
the estimations of f] and P^, the sample mean fj and sample covariance 
are given by 

2L 

i=0 
2L 

Pv = Y.^^ - ^) -vf + f^ - V) (yo - vf , (4.7b) 

1=0 

where the scalar (3 is also an adjustable parameter. In the original work [lO] . 
the term /3 (3^o — v) (3^o — v)'^ does not appear on the right hand side (rhs) 
of Eq. (14.7bp . However, introducing this additional term has the following 
benefits: firstly, it can reduce the approximation error. For example, it was 
shown in |31] that, if x follows a Gaussian distribution, (3 = 2 yields a better 
covariance estimation than /5 = 0. Secondly, since the weight Wq can be 
negative, it is not guaranteed that the first term ^ Wi (3^, — fj) (3^j — fj) on 

1=0 

the rhs of Eq. (14.7bl) is positive semi-definite. However, by adding the second 
term, the effective weight of the transformed sigma point 3^o in Eq- (14.7bp 
becomes Wq + (3. Thus by choosing an appropriate value for (3, we can pro- 
vide some compensation so that the sample covariance P^ is guaranteed to 
be positive semi-definite. Finally, a positive value of (3 increases the error co- 
variance Pr,. This is similar to the covariance inflation technique introduced 
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in § 13.3.3. 1[ Thus it may improve the performance of a filter given a suitable 
value of (3. 

For convenience, we summarize the main procedures in the UT as follows: 
Generation of sigma points: 



Xq = X, 

A', = x+v/lTA(S,.),, ^ = 1,2,--- ,L, (SaD 



Xi = ^-VL + X (S,),_^ , ^ = L + 1, L + 2, ■ ■ ■ ,2L. 



Allocation of associated weights: 

A 



Wo 



Estimations of the mean and covariance of the transformed random variable 
rj: 

= J^iXi), 1 = 0,1,- ■■ ,2L, 

2L 

i=0 
2L 

P. = E - ^) -vf + f^ (3^0 - ^) (3^0 - vf . 



i=0 
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4.3.2 Accuracy analysis 

We now conduct an accuracy analysis of the UT, which mainly follows the 
ideas in [10l[82]. 

For our purpose, we first consider the Taylor series expansion of a nonlin- 
ear function T around x, the mean of the m-dimensional Gaussian random 
variable x ~ (x : x, P^;). Let x = x + (5x such that 5x ~ (5x : 0, Px), 
then 



r/ = ^(x + 5x) =^(x) + D5x-F+5|^ + --- . (4.9) 

d d 



Let V = — — , ■ ■ ■ , — — be the gradient operator, then the operator 

OXi OXm 



D5x = 5x'^V = ^5x,— (4.10) 



' dxi 
1=1 * 



acts on JF on a component-by-component basis [13]. For example 



D,.^ = ^, (4.11) 

i=l * ^ 



where 

dJ" fdFi dF2 dFk^^ 



dxj \ dxi dxj dxj 



(4.12) 



given that JF = (Fi, F2, ■ ■ ■ , Fk) is a fc-dimensional vector function. Since 
all of the derivatives of J-' in the expansion are evaluated at x, for ease 
of notation, hereafter we may often drop the localization information. For 
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example, we use dT jdxi to represent dT ldxi\y^ when it causes no confusion. 

Note that, the operator V only acts on the nonlinear function T or itself, 
but not on the perturbation 5x. This point will be useful in our deduction. 
For example, to compute 77 = E (r/) according to Eq. (14. 9p . one has to evaluate 
the expectation of the second order term E (D^^JF), which, by definition, can 
be re-written as 



E (D^^^) = E [(5x^V) ((5x^V) T\ 
= E [V^ (5x5x^) VJ^] 
= V^E ((5x5x^) VJ^ 



(4.13) 



To interpret the result in the above equation, one may treat V as a constant 
vector, which acts on V itself and only, but not on the covariance P^,. 
For illustration, we consider a two dimensional case, where (5x = {6x1, 6x2) , 



d d 



dxi ' dx2 



, and 



pl2 



p21 p 



22 



(4.14) 



with P^" = E (6x1), = E (6x1), and P^^ = E {6x16x2) = Pf = E {6x26x1). 
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In this case, we have 



d d 



pU pl2 



dXi '9x2/1 p21 p22 



dxi 
d 

\dxi/ 



P. 



11 



d_ 

dxi 



+ P 



.21 



d 



)12 



22 



9 



9xi ^ dx2 



A2 /}2 ^52^7 



9xf 



dxidx2 ' ^ dxi 



\dX2/ 



(4.15) 



which is consistent with the result in 

Applying the above principle and noting that E (5x) = 0, the mean and 
covariance of the random variable r/ in Eq. (14.91) are then given by 



?7 = E (?]) 



^(x) + i (V^P. V) ^ + i E (dL-F) + 



E 



(Vj^)^P^.(VJ^) +E 



D,.^(DL^)' 



+ 



V^P.V 



V^P.V 



+ 



(4.16a) 



(4.16b) 



In particular, if both the pdf p (5x) of 5x and its support are symmetric 
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a 



about the origin □, the odd order (central) moments of the m-dimensional 
random variable (5x = {6xi, ■ ■ ■ , Sxm)'^ are all zero, i.e., 



E 



i=l 







(4.17) 



if the summation p = of the non-negative integers > is an odd 



1=1 



integer. In this case, it can be shown that 



E (D^,,J^) = E ((5x'^V5x^V(5x^VJ^) 
= V^E (5x5x^V5x^) VJ^ 
= 0. 



(4.18) 



Thus Eq. f l4.16ap can be further reduced to 



V = -F(x) + ^ (V^P. V) + ^ E (dL-F) + 



(4.19) 



To analyze the accuracy of the UT, we also need to expand the trans- 
formed sigma points {3^j : = ^ ('^i)}f£o around = x. Let 6Xi = — x. 



* p ((5x) is not necessarily Gaussian. For example, it can be a uniform distribution on 
the interval [—1,1] in one dimensional case. 
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then according to Eq. fl4.4l) . 



= I 



for i = , 
for i = 1, ■ ■ ■ , L , 
VLTX{S^)i_L, foTt = L + l,--- ,2L. 







VlT\(s 



XJi ; 



(4.20) 



Substituting 6Xi into Eq. (14. 9p . the sample mean fj of the transformed 
sigma points are then given by 



2L 



1=0 
2L 



1=0 



^(x) + 5a;'^v^ + - vj^ 



' 2L \ / 2L \ ^ ^ / 2L \ 

5^ H^, ^(x) + J2 ^^^^^ + 

^i=0 / \i=0 / \i=0 / 

/2L \ 2L 

\i=0 / ■ j=0 

(4.21) 
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From Eqs. (IQ]) - (H^ . it is evident that 

2L 

i=0 
2L 

^ ly.^A;, = ; (4.22) 

i=0 
2L 

1=0 

Moreover, because of the symmetry in sigma points, we have 

5Xi5XjW5X^ + 5XL+^5Xl^^V5Xl^^ = 0, for z = 1, ■ ■ ■ , L . (4.23) 



Thus it is clear that 

2L 



WibXibXjVbXj = . (4.24) 



i=0 



Substituting the above identities into Eq. (14.211) . we have 



2L 



i=0 

1 1 

= ^(x) + - V^P. + - 5^ W,Bt^^J^ +■■■. 



i=0 



(4.25) 



Thus by comparing the UT estimation r) in Eq. (14.251) with the expecta- 
tion f] in Eq. (14.191) . it can be seen that the expansion of r) matches that of fj 
up to third order term (i.e., the term E (D^^JF) in Eq. (I4.16ap ). and in gen- 
eral differs from fj at fourth order. However, in some special situations, for 
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example, if JF is a nonlinear function with fourth and higher order derivatives 
being zero, then fj is equal to fj. 

In contrast, in the extended Kalman filter (EKF), an unbiased estimation 
fj based on the linearization scheme in Eq. (14. ip is given by 57 = JF(x), which 
matches the expansion of f] in Eq. fl4.19p only up to first order (E(D5x^)), 
and is therefore less accurate than the UT in this sense. 

The same arguments can be applied to study the accuracy of the covari- 
ance estimation P„. To this end, we first consider the estimation without 



the compensation term P (3^o ~ v) (3^o ~ v) Eq. fl4.7bp 



2L 



Pv = Y.^^ (^^-^) (^^-^)' 



1=0 



2L 



i=0 



V^P.V 



V^P.V 



T 



(4.26) 



Comparing P^ with its expectation P^ in Eq. (14.16bp . one can see that the 
expansion of P^ matches the two terms in the expansion of P,, that contain 
{VJ^fP^ (VT) and [(V'^P^V) [(V^P^V) ^f. But the other terms in 
the expansion of P^ might differ from those in the expansion of P^. 

Similarly, in the EKF, the estimation P^ based on the linearization scheme 
in Eq. (14. ID is given by P,, = (VJ-')^P^ (VJF), which matches only the first 
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term in the expansion of in Eq. (14.16bp . and thus is also less accurate 
than the estimation in Eq. fl4.26p obtained through the UT. 

It is also possible to apply similar arguments in this chapter to analyze the 
accuracies of the mean and covariance estimations of the ensemble Kalman 
filter (EnKF). This is done in Appendix O The analytical results there 
indicate that, under the assumption of Gaussianity, estimations based on the 
UT can avoid some sample errors and bias that appear in the EnKF due to 
the effect of finite ensemble size. 

The benefits of introducing the additional term /3 (3^o ~ v) (3^o ~ ^7)^ will 
be discussed again in the next section. 



4.4 Scaled unscented transform 

From Eq. (14. 4p we see that, A is a parameter that affects the distances of 
sigma points to their center. With nonlinearity in the transform function JF, 
A is also a parameter that affects the higher order moments of the transformed 
sigma points. The scaled unscented transform (SUT) extends this idea by 
further introducing a scale parameter a to the UT |44j . 

To see this, we first construct an auxiliary random variable 

z = ^(x) + -^(" + "^"^--^("\ (4.27) 

where a and yU are two free parameters (/i 7^ 0). Eq. (I4.27P is similar to 
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the Taylor series expansion in Eq. fl4.9p up to first order, in that the sec- 
ond term on the rhs of Eq. fl4.27p can be considered as a divided difference 
approximation to the term D^x^ in Eq. fl4.9l) . Compared with the idea of 
hnearization in Eq. (14. ip to construct the extended Kalman fiher (EKF), the 
advantage of taking the form in Eq. fl4.27p is that there is no need to evaluate 
the derivatives of T . What we need to do next is to evaluate the mean and 
covariance of the transformed random variable 77 = jF(x + 5x) based on the 
auxiliary variable z. 

In analysis, we also expand JF(x + a 5x) around x such that 



0? DLJ^ . 0? D^^J- 



z = ^(x) + -D,x^+ ^ + 



+ 



(4.28) 



Then the mean z and covariance P* = /i are given by 



z = E(zl 



,z - z z - z 



/iE 

"'(V^r P, (V^) + ^<!E 
/i 2/i 

bjj, 



E 



(4.29a) 

(4.29b) 
DL^(D5x^)^]} 



D,x^ (dL^) 



+ E 



dL-F(d,x^)^ 



4 

+ ^ {e [dL-f (dL^)^] - (v^p.v^) (v^p.v^)^} 



Comparing Eq. (I4.29P with Eq. (14.161) . it is evident that when a = = 1, 
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we have f] = z and = P* Q If a 7^ 1, one may use z and P* as the 
approximations to fj and P,,, respectively. To this end, a natural choice is 
to let fi = cP' . Thus under the assumption that (5x is Gaussian such that its 
odd moments are all zero, Eq. fl4.29p is reduced to 



z = Efzl 



(4.30a) 



1 o? 
^(x) + - (V^P.V^) + -E + 



z - z z - z, 



(4.30b) 

+ e[dL^(D5x^)^]} 
+ ^ {e [dL-f (dL^)^] - (v^p.v^) (v^p.v^)^} + ■ ■ ■ . 



{yTf p. (V^) + y {e \p,^T (DL^)^ 



With this choice, z and P* agree with f/ and P.^ up to second order (moment) 
term (i.e. the term that contains only one P^). Other higher order terms 
scale with the parameter a. 

When < 1, P* underestimates P^, with — > being the extreme 
situation (equivalent to the covariance estimation in the EKF). In contrast, 
o? > \ means that P* overestimates P,,. This is similar to the covariance 
inflation technique in § 13.3.3.11 and thus is desirable provided that o? is not 
too large. The difference AP = P.^ — P* between P^ in Eq. fl4.16bl) and P* 



^(5x is assumed to follow the Gaussian distribution iV((5x : 0, Px), hence in the second 



line of Eq. g29b|, we have E Ti^^T {plj")^ =E Ti'lJ'i^s^Tf 



0. 
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in the "worst" case — 0, is therefore given by 



AP = E 



»2 T\T 



6 



V^P.V 



6 



V^P.V 



(4.31) 



+ 



To further reduce the approximation error AP, a simple idea is to introduce 
an extra term, /? (z — ^(x)) (E (z) — jF(x))"^ in the expression of P*. From 
Eq. (OUil) . 



f3{z- J-(x)) (z - ^(x))^ = ^ (V^P.V^) (V^P.V^)^ + ■ ■ ■ . (4.32) 



Thus by choosing a proper value of (3 (which itself depends on the distribution 
of x), one may reduce the difference AP in Eq. (I4.3ip . For example, if 5x 
follows a univariate normal distribution A^(5x : 0, 1), then it can be shown 
that 



E (d^,^ (D^^^)^) = 3 (V^P,.V^) (V^P.V^)'' . (4.33) 

Thus choosing (3 = 2 will reduce the approximation error in the fourth order 
(moment) terms, so that 



AP = P, - P: - /3 (z - ^(x)) (z - ^(x))^ 



E 



D,.^(DL^_f ^D|,^(D5.^)^ 



6 



6 



(4.34) 



+ 
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Other schemes to reduce AP should also be possible. However, they might 
be more complicated in implementations. 

In practice, to estimate z and P*, we apply the UT introduced in the 
previous section. To this end, we first generate a set of sigma points {A'jj^^Q 
with respect to the quartet {a, A, x, S^;) such that 

Xq = X, 

A", = X + ay/LTX (S,), , z = 1, 2, ■ ■ ■ , L, (4.35) 



A", = X - aVL + X iS^),_L ,i = L + l,L + 2,--- ,2L. 
According to Eq. fl4.27p . the transformed sigma points {2j}f=o given by 

(4.36) 

The estimated mean z and covariance P* (with the compensation term), are 
given by 




2L 

z = 

1=0 
2L 



^W,,^., (4.37a) 

1=0 
2L 

P: = Y.^^ - ^) - z)^ + (z - ^(x)) (z - ^(x))"^ , (4.37b) 



1=0 



where Wi are the weights determined by Eq. (14. 5p . 

The sample mean z and sample covariance P* obtained in Eq. f l4.37p will 
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then be used as the approximations to fj and P^, respectively. However, one 
possible problem is that z and P* are expressed in terms of the set of sigma 
points whose physical meanings might be hard to interpret. Thus 

it is preferable to express z and P* in terms of the transformed sigma points 
{3^i : yi = ^ {Xi)}'ito of the original nonlinear system. To this end, we note 
that Zi = T (x) H — - (3^j — T (x)) is just a linear transformation of y^. Thus 



we have Il2 



2L 



f^ = ^ = Y^Wty,, (4.38a) 

i=0 

P, = P: = ^ W,^ (3^. - fi) {y, - r))^ + (1 + /5 - o?) (3^0 - f\) (3^0 - f\f , 



i=0 



(4.38b) 



where the weights, VT/, are given by 



(4.39) 



It can be verified that the sigma points in Eq. (14.351) . associated with the 
weights W? also capture the mean x and covariance P^^ of the random vari- 
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able X, so that 



2L 



1=0 

2L 

X = J2WiXi = 5^; (4.40) 

1=0 
2L 



T 



Note that in Eq. fl4.38bl) there exists an extra term (1 — a^) (3^o — v) (3^o ~ v)'^ : 
which is due to the introduction of the scale parameter a. This does not ap- 
pear in the unscented transform (UT), where a = 1. 

We summarize the main procedures of the SUT as follows: 
Generation of sigma points: 

Xo = x, 

Xi = ^ + aVTTX{S,)^,t = l,2,--- ,L, <^Ml 



A'i = X - aVL + X (S^) , i = L + 1, L + 2, ■ ■ ■ , 2L. 



Allocation of associated weights: 



Estimations of the mean and covariance of the transformed random variable 
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7]-. 

2L 

1=0 
2L 

P. = E - ^) -vf + {i+rj- c^o - v) (yo - vf . 

The accuracy analysis of the SUT can be conducted in a similar way to 
that in § I4.3.2t but with the weights Wi of the UT therein replaced by the 
weights Wf of the SUT. Thus here we do not repeat it. 

Since the UT can be considered as a special case of the SUT, hereafter 
we will use the SUT in general discussions and drop the superscripts in the 
weights of the SUT. 

4.5 Scaled Unscented Kalman filter as the 
approximate solution 

Applying the SUT to the propagation step of RBE leads to the scaled un- 
scented Kalman filter (SUKF). Without loss of generality, we assume that at 
instant A; — 1, a set of sigma points jj^^p"^ with respect to the quartet 

(a, A,x^_]^, S^"]^) is available, where a and A are the same parameters as in 
Eq. fl4.35p . ^1-1 analysis mean, and S^ti is a square root matrix (with 
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column vectors) of the analysis error covariance P^.i- We also assume 
that the associated weights are {W^fc-i,i}^=o~^- 

Following [101 SH [H21 [S3] , the main procedures of the SUKF are also split 
into the propagation and filtering steps. 

4.5.1 Propagation step 

The ensemble mean and covariance are evaluated according to the 
following formulae: 

xt, = Mk,k+i {X^-i,) ,^ = 0,■■■, 2L,_i , (4.42a) 

2ife-l 

±l=Yl ^^'-^A. ' (4-42b) 

i=0 

+ {l + P-a') (xt,o - xt) Ko + Qk. 

To compute the Kalman gain K^, it is customary to first compute the 
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cross covariance P^^ and the projection covariance P^'' [101 SI], given by 

2ife-l 

yk=Yl Wk-i^Hk , (4.43a) 

2ife-i 

Pk = E ^'^-M - {nk (xtj - Jkf (4.43b) 

j=0 

+ {l + P-a') (xt,o - ^l) [n, (xt,o) - yfe)"" , 

^-^'fc — 1 

pf = E ^'^-i.^ (^^ K.) - y^^) (^'^ W.) - (4-43C) 

i=0 

+ (1 + /5 - a^) [Hk (xt,o) - yfc) (^fe (x^o) - y^^ • 

As in the ensemble square root filter (EnSRF), we re- write the above 
covariances in terms of some square root matrices. To this end, we introduce 
two square roots, and S^, which are defined as 



Wf^ifl (4,0 - xt) , v^W^ (xt,i - xy , ■ ■ ■ , sJwZ^, (xt,2L,_, - xt) 

(4.44a) 



{^k (xt,o) - Yk) , v/W^ [m (xt,i) - yfc) , 
Wu-i^L,., (n, [4,2L,_,) - Yk)] , (4.44b) 



where Wj^^^ ^ = Wk-i,o + l + (3-a\ Then the covariances can be re-written 
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as 



PI = Sf {Sf)^ = SliStf + Q,, (4.45a) 
P- = S^S^)^ (4.45b) 
P^ = S^(St)^ (4.45c) 

where S^'' is a square root of P^, which can be obtained by letting S^^ = 
■\J Si (Sl)'^ + Qk, following the numerical scheme in § 12.3. ll r. In particular, 
if there is no dynamical noise, then it is customary to let S^^ = S^. 

Finally, the Kalman gain can be calculated in terms of the square 
roots such that 



(4.46) 

Sfc (Sfc) (Sfc(Sfc) +Rk 



4.5.2 Filtering step 

Once a new observation is available, one updates the sample mean and sample 
covariance of the background, so that 

±l = ±i + Kk{yk-nk{K)), (4.47a) 
Pt = Pl-KJPzY . (4.47b) 



•^In the context of the SUKF, there is actually no need to compute S^^ [40l|44]. This, 
however, is not true for the divided difference filters, as will be seen in the next chapter. 
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To obtain a square root S^'^ of P^, one may adopt the numerical scheme in 
§ l2.3.1l to factorize the updated covariance as = (S^*^)^. 

Having the updated sample mean and a square root S^", one can 
generate a new set of sigma points {X^^-}^to instant k with respect to the 
quartet (a, A, x^, S^"), and compute the associated weights {Wk^i}'^to- Then 
by propagating j}^=o forward, one can start a new assimilation cycle at 
instant k + 1. 

4.5.3 Summary of the scaled unscented Kalman filter 

We summarize the main procedures of the SUKF as follows: 
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Propagation step: 



x^^, = Mk,k-i , ^ = 0, • • • , 2Lfe_i , 

=0 

V^^i.o (4,0 - xa , (x^,, - xa , 



Sx 
k 



- X, 



= V'w^ (xU) - y.) , (H, (x^ J - Yk) , 

■per /q'^^^ 



(4.48a) 
(4.48b) 

(4.48c) 
(4.48d) 

(4.48e) 
(4.48f) 

(4.48g) 
(4.48h) 
(4.48i) 
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Filtering step: 



x^ = x^ + K, {y,-n, (4)) , 




Analysis scheme: 
Sigma points: 



'^k,0 — -^k 1 



XI, = + « v/Z^ (SD, , ^ = 1, 2, • • • , , 
X^, = x« - ayjL, + A (Sr),_^^ , i = + 1, Lfc + 2, 

Associated weights: 
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4.6 Reduced rank scaled unscented Kalman 
filter for high dimensional systems 

In practice, one may not wish to apply the SUKF directly to high dimen- 
sional systems. To see this, recall that in the generation of sigma points, 
one requirement is that the number of sigma points be larger than twice the 
dimension of the system under assimilation in order to avoid rank deficiency. 
This may be infeasible, and sometimes actually unnecessary for data as- 
similation in high dimensional systems. Therefore, some modifications have 
to be introduced to the SUKF in high dimensional systems. To do this, we 
follow the idea in [55]. We perform a truncated singular value decomposition 
(SVD) on a covariance matrix, as described below, to generate sigma points 
with controlled numbers. The SUKF producing sigma points in this way will 
be called the reduced rank SUKF, which will be used in subsequent numeri- 
cal experiments. For convenience, we may sometimes use "SUKF" to mean 
the "reduced rank SUKF" when it causes no confusion. 

Without loss of generality, we assume that at time instant k — 1, we have 
a set of (2/fc_i + 1) sigma points, in terms of = {^k-.i^i}'^'^Q^ with the 
corresponding weights = {W^-i.ijjJ'o'^' where the choice of Ik-i will be 
discussed later. 

^Because some of the system states may be correlated, so that the covariance of the 
system states itself may not be a full rank matrix. 
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4.6.1 Propagation and filtering steps 

The procedures at the propagation and filtering steps of the reduced rank 
SUKF are the same as those of the SUKF. We first define a set of forecasts 
of the propagated sigma points 

= {xt, : 4, = Mk,k-i (XUi) , ^ = 0, ■ ■ ■ , 2/,_i} , (4.52) 

based upon which the background sample mean x^, sample covariance 
and the Kalman gain can be computed according to the formulae in 
§ I4.5.H but with Lk-i therein replaced by Ik-i- Then the analysis mean x^ 
and covariance are updated using the formulae in § I4.5.2I 

4.6.2 Analysis scheme 

To generate a set of sigma points A"^ = {^kfi^''' y'^k,2ik} "^i^h controlled 
number, the truncated singular value decomposition (SVD) is conducted on 
P^. Let P^ be an m X m matrix, then it can be decomposed as 

P^ = E^D^ (E^)^ , 

where = diag((T^ n ' ' ' ? CTfc m) a diagonal matrix consisting of the eigen- 
values cr^ j of P^, which are arranged in descending order, i.e., al ^ > al j > 
for i > j, and E^ = [e^ i, ■ ■ ■ , is the matrix consisting of the correspond- 
ing eigenvectors e^^j. A new set of sigma points Xj^ = [Xj^^, ■ ■ ■ ,^1^21^} 
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(4.53) 



then generated as follows: 



a 



= X' 



a 



k,0 



'k,i 



a 



Xfc + a (4 + A) ak,iek,i , i = l,--- Jk, 



(4.54) 



k,i 



a 



Xfc - a ih + A) ak,i-if^ek,i-i^^ , i = lk + I, - ■ ■ , 2/fc, 



where Ik is an integer to be specified. Note that using the new sigma points 
as the analysis ensemble, the sample mean of sigma points is equal to x^. 
Thus the SUKF is an unbiased ensemble filter according to the definition in 
[?T] (also see the discussion in § I3.3.2p . 

It is worth noting that, Eq. fl4.54p only requires the first Ik pairs of eigen- 
values and eigenvectors, rather than the full spectrum. Therefore, to reduce 
the computational cost in high dimensional problems, some fast SVD algo- 
rithms, e.g., the Lanczos or block Lanczos algorithm (cf. [20j and [29[ ch 9]), 
can be adopted to compute the first Ik pairs of eigenvalues and eigenvectors 
only (for example, see [81]). This may reduce the computational cost of the 

SUKF in high dimensional systems 0. 

®To see this, we consider a simple scenario, where the m-dimensional dynamical system 
is given by x^+i = Ax^. Here A is supposed to be a full rank matrix (otherwise the model 
size can be reduced). Then the computational complexity of propagating one state point 
forward is in the order of m^, denoted by 0{m^). Therefore for the full rank SUKF, the 
computational complexity of propagating all sigma points forward is 0(jn'^). In contrast, 
for the reduced rank SUKF, by using the Lanczos algorithm (or its variants) to compute 
the eigenvalues and eigenvectors, the computational complexity of one iteration is at most 
0{m^), or even less if A is a sparse matrix [20, p. 35]. Thus to evaluate the first Ik pairs of 
eigenvalues and eigenvectors, the computational complexity is Ik x n'* x 0(r7??), where fi'* 
is the average number of iterations in executing the Lanczos algorithm. The computational 
complexity of evolving 2lk + 1 sigma points forward is {2lk + 1) x 0{m?). Therefore, for the 
reduced rank SUKF, the overall computational complexity of generating sigma points and 
propagating them forward is [Ik x (n** + 2) + 1] x 0{'m?), which can be (much) less than 
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For convenience, we call Ik the truncation number (at time k), which 
influences the performance of the reduced rank SUKF. To see this, we let 

Pt = J2cTlekAek,if , (4.55) 

i=l 

which can be considered as an approximation to the matrix 

m 

Pt = Y.cTlekAek,if ■ (4.56) 
1=1 

If Ik is too small, some important information of P^, in terms of al je^ j (e^. j)"^ 
for i > Ik, will be lost. However, as the computational cost is also a concern, 
it is not desirable for Ik to get too large. Moreover, in many situations, 
if Ik is large enough, al may be already very small compared with the 
leading eigenvalues. Thus the improvement obtained by further increasing Ik 
becomes negligible. In this sense, one may choose a moderate value for Ik to 
achieve a tradeoff between accuracy and efficiency. In our implementation, 
to prevent Ik getting too large or too small, we also pre-specify some upper 
and lower bounds, denoted by k and respectively, to guarantee that Ik falls 
within an acceptable range h < h < lu [55] 1^. 

Another point to note is that, the approximate matrix based on the 

0{m^) in some high dimensional systems, e.g., a weather forecasting model with millions 
of state variables (or even more), while the sizes of Ik and n** may be in the orders of 10^ 
and 10'^, respectively, or even less (as an example, see [S^ for the convergence of a Lanczos 
algorithm) . 

^The choice of U and lu itself may depend on our experience and needs in running the 
system under assimilation, and hence is case-dependent in general. 
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set of sigma points = {'^kfiy''' j^k,2ik} rank 4 because of the 

symmetry in sigma points. But at the next assimilation cycle, the back- 
ground covariance P^+i evaluated based on the propagated sigma points 

^i+i = {Mk+i,k{Xkfi), ■ ■ ■ ,Mk+i,k{Xk,2ik)} "^^y ^^^6 ^ ^^^k higher than 4 
for a nonlinear transition operator A4k+i,k- This is because, in the set Xj^ = 
{^k,o^ ■ ■ ■ 5 ^k,2ik} there exists redundant information due to the symmetry in 
sigma points. But after propagation, the symmetry will normally be broken 
thanks to the nonlinearity of The set X^_^^ = {Mk+i,kiXl^,o)r ■ ■ , -^fe+i,fc('^fc',2«J} 

may thus explore more information of Aik+i,k than any one of its (strict) sub- 
sets does. Therefore the rank of P^^^ can be higher than that of P^. On 
the other hand, as illustrated in § 13.3.3.21 (cf. Fig. 13. 5p . by conducting co- 
variance filtering one can in effect increase the rank of a sample covariance. 
For these two reasons, one may conduct SVDs on the analysis covariances 
without worrying about the deficiency of their ranks. 

In principle, the choice of the truncation number l^. may be determined 
by the geometry of a dynamical system in phase space. Take a dynamical 
system with a chaotic attractor as an example, the attractor dimension (e.g., 
the Hausdorff dimension) may be substantially lower than the (topological) 
dimension of the dynamical system. Suppose that at the k-th assimilation 
cycle, the local dimension of the trajectory around the analysis is d^, then 
in principle one can choose to be around m.in{dk, lu), where min(a, b) means 
the minimum between a and b, and lu is an acceptable upper bound of Ik for 
practical computation. Therefore, if dk is not too large {dk < lu), one can 
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let Ik be close to so that the number of sigma points is about 2dk + 1. In 
this case, the number of sigma points is not too large, but the approximation 
matrix captures the structure of well such that P^ ~ P^. However, if 
the local attractor dimension still appears too large {dk > ly) for the purpose 
of computation, the upper bound will work to prevent the number of sigma 
points {2lu + 1) getting too large, but at the cost of deteriorating the quality 
of covariance approximation. 

In practice, it is infeasible to compute the local attractor dimension dk at 
each assimilation cycle. One may instead use some ad hoc criterion to choose 
the value of Ik- In our implementation, we let Ik be an integer such that 

(j^i > trace (P^) /r^ , i^l,--- ^k, 

^ ^ (4.57) 
(j^ , < trace (p«)/rfc, i>lk + l, 

where is the threshold at the k-ih cycle (we will discuss how to choose 
later). This is equivalent to saying that we generate sigma points based on 
the eigenvectors whose corresponding eigenvalues are larger than a specified 
tolerance. 

Under the assumption of Gaussianity, it can be verified that the per- 
turbations of sigma points with respect to their center X^q, in terms of 
a{lk + ^y^^o'k,i^k,i ior i — 1, • • • ,lk, are equally likely in the sense that their 
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probabilities, in terms of 

p(5x) = (27r)"/^ (detP^)-V2exp |-1 {6^f (p^)"' (5x)| , (4.58) 

are the same (also see the discussions in [HZ!), where det • means the deter- 
minant of a matrix. Therefore it is natural to assign an identical weight to 
all the perturbations. Consequently, in the spirit of Eq. (14.391) . the weights 
associated with sigma points are allocated as follows: 

« {k + A) a 

2a^ [Ik + A) 

In summary, the analysis scheme of the reduced rank SUKF is given as 
follows: 

Generation of sigma points: 

■ya _ ' a 

= + a (Z, + A)^/' (Tfc,,efc, „ ^ = 1, ■ ■ ■ , 4, (M 

1 /2 

= - a (4 + A) (rk,i-ikGk,i-ik^ i = Ik + I, - ■ ■ , 24, 
Allocation of associated weights: 



_ A ^1 

a (tfc + A) a'' 

2a^ (/fc + A) 
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(4.60) 



4.7 Example: Assimilating the 40-dimensional 
Lorenz-Emanuel 98 system 



4.7.1 The testbed and the measures of filter perfor- 
mance 

The dynamical and observation systems are the same as those in § I3.4.1I 
That is, the dynamical system (LE 98) is governed by 

diX ' ^^^^^ 

= {Xi+i - Xi^2) Xi-i -Xi + 8, for i = 1, ■ ■ ■ , 40 , (13 .4411 

at 

while the observation system is 

Yfc = Xfc + Vfc , (I3.45P 

where follows the Gaussian distribution N {\k '■ 0,1). 

We integrate the dynamical system Eq. fl3.44p through a fourth-order 
Runge-Kutta method (HH Ch. 16]. We choose the length of the integration 
window to be 100 dimensionless units, and the integration time step to be 
0.05. For notational convenience, we denote this setting by : 0.05 : 100. 
Similar notations will also be used later. We make the observations of the 
dynamical system at every integration step. 

We also adopt the relative rmse Cr and rms ratio R defined in § 13.4.21 
as the measures of filter performance. In the context of the reduced rank 
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SUKF, they are given by 

k=l 

and 

1 |,-(24 + i)K-xrii. 

ll^fc.j -^fc l|2 

i=l 

respectively. 

According to Eq. fl3.5ip . the expectation Rg of the rms ratio is 

Re = ^{leff + l)/{2leff + l) 

by letting n = 2/e// + 1 in Eq. fl3.5ip . where lejf is the "effective" truncation 
number over the whole assimilation window. Hence, if the true system states 
are statistically indistinguishable from the corresponding sigma points, the 
values of R and Re will be close to one another. Note that Re ~ 0.71 for 
a sufficiently large /e//. For simplicity, we let lejf equal the average of the 
truncation number I, i.e., /g// = I = Yli=r ^k/kmax- Again, R > Re means 
that a sample covariance of sigma points underestimates the error in state 
estimation, while R < Re overestimates the error in state estimation [611 [89] . 
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4.7.2 Some issues in implementation 

4.7.2.1 Positive semi-definiteness of the covariance matrices 

One issue in implementing the SUKF is to guarantee the positive semi- 
definiteness of the covariance matrices. To this end, first of all we require 
/fc + A > so that in Eq. ( ]4.54p the square root of Ik + X is real. Also note, 
when computing the covariances (cf. Eqs. fl4.42cp . fl4.43bl) and fl4.43cp ). the 
effective weight of x^_(^j^ q is Wkfl + I + P — a"^ {P > 0). So we also require 
that Wk,o + 1 + /3 — > 0, which, together with Eq. (I4.60p . is equivalent to 
saying 

^ +1-^ + 1+13- a'^>0. (4.62) 



Ik may take different values at different assimilation cycles. However, since 
Ik is bounded such that < k < Ik < lu, with some algebra, one can obtain 
the sufficient conditions 

A > -// + — ^ , 

(2 + /?)' 



a>\l2 + (3-^/{2 + (3f-j^, (4.63) 



a<xl2 + f3+J{2 + f3f- 



which guarantee the positive semi-definiteness of the covariances. 
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4.7.2.2 The choice of the threshold 

The choice of the threshold F^, follows the work [55]. We begin by specifying 
a threshold Fi at the first assimilation cycle. If Fi is a proper value such 
that the corresponding truncation number li satisfies k < h < lu, then we 
keep Fi and at the next cycle we start with F2 = Fi. If Fi is too small, 
so that li < li, then we increase it by replacing Fi by l.lFi + 200. We 
continue the replacement until h falls into the specified range, or the number 
of replacement operations reaches 30 (in which case we simply put h = li, 
regardless of the value of Fi). Similarly, if Fi is too large, so that li > lu, then 
we decrease it by replacing Fi with Fi/1. 1—200. We continue the replacement 
until Zi falls in the specified range, or the number of the operations reaches 
30 (in which case we simply put li = lu)- After the adjustment, at the next 
cycle we start with F2 = Fi and adjust it (if necessary) to let I2 fall into the 
specified range, and so on. 

4.7.3 Numerical experiments and results 

4.7.3.1 Effects of the inflation factor 5 and the length scale Ic on 
the performance of the reduced rank SUKF 

To improve the performance of the reduced rank SUKF, we also adopt the 
covariance infiation and filtering techniques introduced in § I3.3.3[ Here we 
first examine the effects of the inflation factor 6 and the length scale Ic on 
the performance of the SUKF. The parameters in the experiments are set as 
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Figure 4.2: The relative rmse of the SUKF as a function of the inflation 
factor 6 and the length scale Ic- 

follows: the inflation factor 5 increases from to 10, with a fixed increment 
of 0.5 each time. We denote this setting by : 0.5 : 10. The length scale Ic 
increases from 10 to 400, with a fixed increment of 20 each time. This setting 
is thus denoted by 10 : 20 : 400. Other (fixed) parameter values are a = 1, 
P = 2, X = —2, lower bound li = 3, upper bound = 6, and the threshold 
at the first assimilation cycle Fi = 1000. 

To begin the assimilation, we randomly choose an initial condition to start 
a control run, and so obtain the true trajectory within the specified assimila- 
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tion window. We then add some Gaussian noise drawn from the distribution 
(vfc : 0, 1) to the true trajectory to generate the observations. The noise 
level (relative rmse) of the observations e"^"" ~ 0.22. To start the SUKF, we 
also generate 6 randomly perturbed initial conditions ^ as the background 
ensemble at the first assimilation cycle. This represents a typical scenario in 
data assimilation, where the ensemble size of the background is often (much) 
smaller than the dimension of the dynamical system. Note that, at the first 
cycle, there are no sigma points propagated from the previous cycle. Thus 
at the first assimilation cycle, we use the ensemble transform Kalman filter 
(ETKF) introduced in the previous chapter to update the sample mean and 
covariance of the background to the corresponding statistics of the analysis, 
and then generate sigma points accordingly. After propagating sigma points 
forward, the SUKF can start running recursively from the second assimilation 
cycle. 

First we examine the performance of the SUKF in terms of the relative 
rmse. We plot the relative rmse of the SUKF as a function of the inflation 
factor 5 and the length scale Ic in Fig. 14.21 As one can see, when fixing /c, the 
relative rmse of the SUKF also exhibits the U-turn behaviour as 5 increases. 
This phenomenon was already explained in § 13.4.41 On the other hand, when 
fixing 5, and provided that 5 is not large (say, 5 < 2), the relative rmse is 

roughly insensitive to the change of Ic- For 2 < 5 < 4, the relative rmse 

^°Given the truth x^^'* at the first assimilation cycle, these 6 different initial conditions 
are just the samples drawn from the distribution N (v^ : x*'*, l). 
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exhibits the U-turn behaviour as Ic increases. For S > 4, the relative rmse of 
the SUKF tends to decrease overall as Ic increases. 

Comparing Fig. 13.71 with Fig. 14. 2^ the SUKF does not consistently out- 
perform the ETKF. This might be due to the following reasons. 

(a) The states of a nonlinear system do not strictly follow a Gaussian distri- 
bution, which violates the Gaussianity assumption in nonlinear Kalman 
filters (see §[3X2]); 

(b) The covariance filtering technique works better for the ETKF than for 
the SUKF (see the discussion below); 

(c) The amount of information in use (also see the discussion below). 

In fact, a closer examination on Fig. 13. 71 indicates that, the SUKF appears 
to be "less dependent" on the covariance filtering technique than the ETKF, 
in the sense that, to achieve a lower relative rmse (e.g. < 0^, the length 
scale Ic of the SUKF tends to be larger than that of the ETKfH The SUKF 
also appears to have a broader region than the ETKF where the filter does 
not diverge (i.e. Cr < ef" ^ 0.22). 

A possible explanation of the above difference between the ETKF and the 
SUKF may be given based on the accuracy analysis in Appendix [Cl where 
we show that in the EnKF (including the ETKF), because of the effect of 



^^From the discussion in § I3.3.3.2| one can see that, given a covariance matrix P, the 
components of the correlation matrix ^ will be closer to 1 for a larger length scale Ic, 
which means that the Schur product P o $ is closer to the original matrix P. In the 
extreme situation such that Ic — +oo, one has P o $ = P, which means that introducing 
covariance filtering does not change the covariance matrix P at all. 
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Figure 4.3: The relative rms errors of the SUKF and the ETKF as functions 
of the covariance inflation factor 6, but without any covariance filtering (/^ = 
oo). Here the experiment setting of the SUKF is almost the same as that 
specified at the beginning of § I4.7.3.H except for that in one experiment 
(corresponding to the dash-dotted line in blue marked by squares), the initial 
ensemble size of the background is = 6, with the lower bound /; = 3 and 
the upper bound /„ = 6, while in another experiment (corresponding to the 
dash-dotted line in black marked by diamonds), the initial ensemble size of 
the background is n = 10, with /; = 10 and /„ = 13. The experiment setting 
of the ETKF is also almost the same as that in § 13.4.41 but with the initial 
ensemble size of the background n = 7 in one experiment (corresponding 
to the dash-dotted line in red marked by asterisks), and n = 14 in another 
experiment (corresponding to the dash-dotted line in green marked by plus 
signs) . 
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finite ensemble size, some spurious modes and bias exist in the estimation 
of a sample covariance. Thus the covariance filtering technique works well 
to reduce the effect by choosing relatively small length scales. In contrast, 
because of the symmetry in sigma points, those spurious modes and bias in 
the EnKF do not appear in the SUKF. Thus there is no need to change a 
sample covariance of the SUKF as much as that of the ETKF. Hence larger 
length scales will work better for the SUKF. 

In Fig. 14.31 we examine the situation where there is no covariance filter- 
ing conducted on both the SUKF and the ETKF. Note that in the SUKF, 
given 2lk + 1 sigma points, only the first /fc + 1 sigma points contain useful 
information because of the symmetry in sigma points (the information from 
the last Ik sigma points are redundant). However, by propagating all sigma 
points forward through a nonlinear function, 2/^ + 1 propagated sigma points 
may explore more information about the nonlinear function compared with 
the choice of propagating + 1 sigma points forward only. Indeed, from 
Fig. 14.31 one can see that, if the upper bound Z„ of the SUKF is equal to the 
ensemble size n of the ETKF minus one, so that either + 1 = n = 7 or 
+ 1 = n = 14 in Fig. 14. 3[ the SUKF always outperforms the ETKF. On 
the other hand, if the ensemble size n in the ETKF is about equal to twice 
the upper bound lu plus one, for example, n = 14 and lu = Q in Fig. 14. 3[ 
the performance of the SUKF with /„ = 6 is still comparable to that of the 
ETKF with n = 14. Therefore in some situations, if it is inconvenient or 
expensive to produce background ensembles, the SUKF may be adopted to 
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Figure 4.4: The rms ratio of the SUKF as a function of the inflation factor 
6 and the length scale Ic- 

improve the performance of data assimilation. 

Next we examine the rms ratio of the SUKF. We plot the rms ratio of 
the SUKF as a function of 6 and Ic in Fig. 14. 4[ As one can see there, when 
fixing Ic, if Ic is not too large (say Ic < 30), the rms ratio R tends to decrease 
as 6 increases. If Ic is relatively large (say Ic > 30), the rms ratio R exhibits 
the f/-turn behaviour as 6 increases. On the other hand, when fixing 6, if 5 
is not too large (say 6 < 1), the rms ratio appears insensitive to the change 
of Ic- But as S increases above 2, the rms ratio R also exhibits the t/-turn 
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behaviour. To make the analysis ensemble (sigma points) indistinguishable 
from the truth (i.e. R ~ 0.71), one should take the parameter values of 5 
and Ic within the strip between the contour levels of 0.7 and 0.8. However, 
overestimation of the analysis covariance (i.e. R < 0.71) can in fact improve 
the performance of the SUKF in the sense that it can achieve lower relative 
rms errors, the same as the phenomenon observed in the ETKF (cf. Fig. l3.9p . 

4.7.3.2 Effect of the scale factor a on the performance of the re- 
duced rank SUKF 

Now we examine the effect of the scale factor a on the performance of the 
SUKF. For a more thorough examination, we also include the covariance 
inflation factor 6 as another variable parameter, although in the previous 
experiments we have already studied its effect. The scale factor a and the 
inflation factor 6 take values from the sets 0.8 : 0.2 : 2.4 and : 0.5 : 10, 
respectively. The values of the other parameters are: (3 = 2, X = —2, length 
scale Ic = 240, lower bound k = 3, upper bound = 6, and the threshold at 
the first assimilation cycle is Fi = 1000. 

We first plot the relative rmse of the SUKF as a function of a and 6 in 
Fig. 14.51 When fixing 6, and if 6 is not too large (say, 6 < 3), the relative 
rmse is insensitive to the change of a If 5 is large (say, 6 > 8), then 

^^This phenomenon has also been found in other experiments, see, for examples, 
Figs. 13.61 13.71 and 14. 2( where the common feature is that, when the covariance infla- 
tion factor 6 is small, the relative rmse of the filter (either the EnKF or the SUKF) is 
roughly insensitive to the change of the other parameter in test. One possible explanation 
to this phenomenon is that, when 6 is small, the error covariance of the background is 
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Figure 4.5: The relative rmse of the SUKF as a function of the scale factor 
a and the inflation factor 5. 

the relative rmse exhibits the U-turn behaviour as a increases, which can 
be explained from the following point of view. Comparing the true error 
covariance, given by Eq. fl4.16bl) . with the estimated error covariance of the 
SUT, given by Eq. fl4.30bl) . we see that a plays a role similar to that of the 

covariance inflation factor b. If a < 1, the error covariance of the SUT is 

underestimated, so that the background will dominate the computations of the sample 
mean and covariance of the analysis, while the influence of the incoming observation is 
not significant. Therefore, the relative rmse does not change too much for relatively small 
inflation factors. 
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Figure 4.6: The rms ratio of the SUKF as a function of the scale factor a 
and the inflation factor 8. 

underestimated. If a > 1, the error covariance of the SUT is overestimated, 
which can therefore improve the performance of the SUKF, provided that a 
is not too large. 

Next we examine the rms ratio of the SUKF, which we plot as a function 
of 5 and a in Fig. 14.61 When fixing 5, and if b is not too large (say 5 < 2), 
the rms ratio will decrease as a increases. If b is larger (say 5 > 4), the 
rms ratio R also exhibits a U -turn behaviour as a increases. To make sigma 
points indistinguishable from the truth (i.e., R ~ 0.71), one should take the 
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parameter values of S and a within the strip between the contour levels of 
0.7 and 0.8. However, overestimation of the analysis covariance (i.e., R < 
0.71) can also improve the performance of the SUKF, just as in the previous 
experiments. 

4.7.3.3 Effects of the threshold Fi and the bounds li, lu on the 
performance of the reduced rank SUKF 

Here the experiments are designed to examine the effects of the threshold Fi 
and the bounds //, lu on the performance of the SUKF. Since the rms ratio 
is only a qualitative measure of filter performance (e.g., underestimation 
or overestimation of the error covariance), we will henceforth only use the 
relative rmse to examine the performance of the SUKF. 

In the first experiment, we let the covariance inflation factor 5 = 0, the 
length scale Ic = 240, the initial ensemble size n = 4, and take a = 1, (3 = 2 
and A = —2. We fix the upper bound lu = 6, but vary the lower bound such 
that // takes values from the set 3:1:6. We also vary the threshold Fi such 
that the logarithmic function log^oTi takes values from the set 2 : 0.5 : 5.5 

s 

We show the numerical results in Fig. 14.71 . Intuitively, the larger the 
threshold Fi and the bound li, the larger the truncation number Ik tends 
to be, which, however, does not guarantee a better performance in terms of 

the relative rmse. Indeed, in Fig. 14. 7[ the optimal threshold log^^Q Fi = 3 is 

^■^This range represents the moderate values of Fi in our choice so as to make the 
truncation numbers Ik neither too large nor too small. 

148 



1.12 



1.1 



1.08 



1.06 



1.04 



1.02 



0.98 



0.96 





Lower bound 


= 3 


-i>- 


Lower bound 


= 4 




Lower bound 


= 5 


-D- 


Lower bound 


= 6 



2.5 3 3.5 4 4.5 

Threshold r.| (in the scale of log^g) 



5.5 



Figure 4.7: The relative raise of the SUKF as a function of the threshold Fi 
(in the scale of log^o) with different lower bounds li. 

the same for lower bounds of li = 3, 4, oIj, while thresholds larger than this 
value will result in larger relative rnis errors. For the lower bound = 6, its 
relative rms errors are smaller than, or at least approximately equal to those 
of the bounds li = 3, 4, 5 in most cases. However, for log^^Q Fi = 3, the relative 
rmse for // = 6 is higher than the other cases. To explain this phenomenon, 
we conjecture that, too small a truncation number Ik is not likely to achieve 



^^li = lu = Q means Ik = Q a.t every cycle, so the threshold Fi does not affect the value 
of Ik in this case. 
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a performance as good as a modest value because it means poor quality of 
covariance approximation. In contrast, too large a truncation number Ik also 
does not necessarily achieve a better performance than a modest value. This 
is because, if a covariance of the system states is not a full rank matrix, 
too large a truncation number may introduce some spurious structures from 
the null space of SVD into sigma points, which are then treated as equally 
likely as the other sigma points, and propagated forward to the next cycle. 
The effect of the spurious structures may be accumulated and eventually 
deteriorate the overall performance. 

In the second experiment, we let the covariance inflation factor 5 = Q, the 
length scale Ic = oo (no covariance filtering), the initial ensemble size n = 10, 
the initial threshold Fi = 1000, and we take a = 1, (3 = 2 and A = —2. We 
fix the lower bound = 3, but take the values of the upper bound lu from 
the set 6 : 1 : 40. 

Fig. 14.81 show the relative rmse as a function of the upper bound lu- 
As one can see, the relative rmse also exhibits the U-turn behaviour as lu 
increases, as for the ETKF in Fig. 13.101 A possible explanation of this 
phenomenon may be the same as the argument in the first experiment, that 
is, some of sigma points are actually obtained from the null space of SVD, 
which cannot be evaluated and propagated as equally as the other sigma 
points, otherwise spurious structures will be introduced so as to deteriorate 
the performance of the SUKF. Similar arguments can be applied to explain 
the U-turn behaviour of the ETKF in Fig. I3.10[ since the square root of an 
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Figure 4.8: The relative rmse of the SUKF as a function of the upper bound 

error covariance, although not necessarily obtained through a SVD, is also 
involved in the ETKF. 

4.7.3.4 Effects of the parameters A and P on the performance of 
the reduced rank SUKF 

Finally we examine the effects of the parameters A and (3. In the experiments, 
we let 5 = 0, Ic = 240, a = 1, = 3, = 6, Fi = 1000, and we take 
the initial ensemble size n = 4. We consider four different scenarios with 
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P = 0,2,4,6 respectivel}cj, and compute 20 values of A in each case. To 
guarantee the positive semi-definiteness of the sample covariances, we start 
with A = —(3li/{l + (3), and increase A by AA = 1 each time. In particular, 
when /5 = and A = 0, the effective weight W^fi + /3 of the ensemble mean 
equals zero for any k. Therefore, in this case, the SUKF can be considered 
as the EnKF equipped with the analysis scheme of positive-negative pairs 
(PNP) (cf [87] and the references therein). 

We plot the numerical results in Fig. 14. 9[ As one can see, when j3 increases 
from to 6, the minimum relative rmse for a given value of (3 decreases. This 
may be interpreted as follows: as pointed out in § 14.3.11 a positive value of 
will increase the error covariance, which is similar to the covariance inflation 
technique introduced in § 13.3.3. H and so a larger value of (3 tends to result 
in a smaller relative rmse, provided that (3 is not too large (otherwise the 
U-turn behaviour may appear). 

However, for each fixed /3, there is no clear trend indicating the optimal 
value of A. A larger value of A does not imply a smaller relative rmse, or 
vice verse. As an explanation of this phenomenon, we note that, with the 
other parameters being fixed, A determines the relative weights between the 
sample mean and the other sigma points (cf. Eq. (14.601) ). If the underlying 
system is linear, then in principle we can compute the optimal relative weights 



^'"'To guarantee the positive semi-definiteness of sample covariances, tlie values of A will 
depend on the choice of fi. Thus it is inconvenient to plot a contour plot with the relative 
rmse as a function of (3 and A. For this reason, we only single out four values of (3 for 
study. 
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(d) Relative rmse vs A with /3 = 6 



Figure 4.9: Effects of the parameters P and A on the performance of the 
SUKF. 



between the sample mean and the other sigma points (under the assumption 
of Gaussianity), and so determine the optimal value of A. Nevertheless, the 
existence of nonlinearity may make the problem intractable. For nonlinear 
systems, the optimal relative weights (hence A) may vary from cycle to cycle. 
However, to search for the optimal parameter A at each assimilation cycle 
will be computationally expensive. Thus in our experiments, we chose to 
fix A within the same assimilation window, so that the same value of A is 
used at each assimilation cycle. In doing this, the fixed A cannot capture 
the variation of its optimal values at different assimilation cycles, therefore 
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it may be difficult to find a clear trend of its optimal value in Fig. fl4.9l) . 



4.8 Summary of the chapter 

In this chapter we introduced the basic idea of the unscented transform (UT) 
and its extension, the scaled unscented transform (SUT). We conducted an 
accuracy analysis for the UT via Taylor series expansions. We also showed in 
Appendix O that, under the assumption of Gaussianity, the UT can achieve 
better accuracy than the EnKF (including the ETKF). 

Incorporating the UT or the SUT into the propagation step of recursive 
Bayesian estimation (RBE) will lead to the unscented Kalman ffiter (UKF) 
or the scaled unscented Kalman filter (SUKF), respectively. In practice, 
however, one may not wish to apply the UKF or the SUKF directly to high 
dimensional systems, since the computational cost in those circumstances 
will become very expensive. To this end, we introduced the reduced rank 
SUKF to reduce the computational cost. 

For illustration, we took the 40-dimensional LE 98 system as the testbed 
to demonstrate the details in implementing the reduced rank SUKF. We also 
investigated the effects of the intrinsic parameters (e.g., a, /3, A etc) on the 
performance of the ffiter. Currently, there are no theoretical grounds that can 
be used to determine the optimal values of these filter parameters in general 
situations. The experiments conducted in this chapter may provide some 
insights into how these parameters affect the performance of the reduced 
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rank SUKF. 
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Chapter 5 

Divided difference filters for 
data assimilation 

5.1 Overview 

The divided difference filters (DDFs) are similar to the extended Kalman 
filter (EKF). At the propagation step, the DDFs also involve a local expan- 
sion of a nonlinear function, not via a Taylor series expansion as in the EKF, 
but through Stirling's interpolation formula. The advantage of adopting this 
formula is that the computation does not involve the derivatives of a non- 
linear function. Instead, one uses divided differences for approximation and 
thus can avoid the difficulty in the EKF. By adopting Stirling's interpolation 
formula, one also needs to generate sigma points as in the scaled unscented 
Kalman filter (SUKF). Thus although the DDFs and the SUKF are derived 
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from different points of view, they are similar to one anotlier in many aspects, 
as will be shown later. 

This chapter is organized as follows. In § 15.21 we state the problem of in- 
terest. Then we proceed to introduce Stirling's interpolation formula in § 15.31 
as the approximate solution to the recast problem in Fig. 14.11 Incorporat- 
ing this formula into the propagation step of recursive Bayesian estimation 
(RBE) leads to the DDFs, as will be introduced in § I5.4[ To reduce the com- 
putational cost, we introduce the reduced rank DDFs in § 15.51 In § l5.6l we use 
the 40-dimensional Lorenz-Emanuel 98 model as the testbed to illustrate the 
details in implementing the reduced rank DDFs, and investigate the effects 
of filter parameters on the performance of the DDFs. Finally, we draw our 
conclusions for this chapter in § 15.71 
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5.2 Problem statement 



Consider the data assimilation problem in the systems described by Eq. (13. ip : 



Xfc = Mk,k-i (xfc-i) + Ufc , P-laP 

Yk = Hk (xfc) + Vfc , P-lbp 

Ufc ~ iV (ufc : 0, Qfc) , (EIED 

V, ~ iV (vfc : 0, Rfc) , dUdD 

E (u,u^) = 4,,Qfc , (EUD 

E(u,vj)=0 V^,j. ([3Ji 



We first discuss how to solve the recast problem in Fig. 14.11 through Stir- 
ling's interpolation formula. Then we apply this formula to the propagation 
step of RBE to derive the DDFs. 

5.3 Stirling's Interpolation Formula 
5.3.1 Basic idea 

The ideas and analyses presented here and in § 15.41 mainly follow the works 

[sllElliS]. 

We first re-state the estimation problem in Fig. 14.11 Let x be an m- 
dimensional Gaussian random variable such that x ~ A^(x:x, Pj.). We 
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transform x by a nonlinear function T to give a transformed random variable 
ri = T (x). Our objective is to estimate the mean f] and covariance P,, of r^. 

In Chapter m we have mentioned the extended Kalman filter (EKF), de- 
rived via a Taylor series expansion of JF. Alternatively, one can choose to 
expand T through Stirling's interpolation polynomials [Ml [65]. For example, 
a second-order approximation can be conducted based on the formula [HH 



where and T)1^ are the divided difference operators defined through the 
following operations [61]: 



Eq. (11-13)] 



= (x + (5x) ^ (x) + V^^T (x) + -D^J^ (x) 



(5.2) 




(5.3b) 



(5.3a) 



L L 



+ E E 5e.5e,[P.(V2)A/;(V2)][P,(/^m(V2)] LF(x). 



i=l j=l,j^i 



Here Vi{h/2) and Afi{h/2) are operators satisfying 



P,( V2)^ (x) =- (^-^ (^x + - (S.), J + - 2 ^^^^^ 
M( V2)^ (x) =^ f X + ^ (S..),) - f X - ^ (S J,") , 




(5.4) 
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with the parameter h being the interval length of interpolation, and {Sx)^ 
the i-th column of the m x L square root matrix S^; of P^;. And Sci denotes 
the i-th element of the Gaussian random variable 6e N {6e : 0, 1), where I 
is the L X L identity matrix. Therefore, 

E {5ei) = 0, E (Se^i) = 1 for i = 1, ■ ■ ■ , L . (5.5) 

Moreover, 5x = Sr^Se follows the Gaussian distribution (5x : 0, P^.). 
With some algebra, it can be shown that 

[P.(V2)]'^(x) = i[P,(/.) + l]^(x); 

mh/2)fj^{^) = 2 [V,{h) - l]^(x) ; (5.6) 
[P,(V2)Ar,(V2)]^(x) = [A/;(V2)P.(V2)]^(x) = iA/',(/i)^(x) . 



Thus Eq. (15. 3 p becomes 



Vs.T{^) ~ ((5e)'^Ar(/i)^(x) , (5.7a) 



^^L^(x) 



\ i=l 



Ah 

L L 

+ Y. H 5ei5e^Mi{h)M,{h)] T {yi) 

i=l j=l,jj^i 



where M{h) = [Afi{h), ■ ■ ■ ,J\fL{h)f. Note that in Eq. fl5.7bp . evaluating the 
terms '^f^i '^j=i j^i 5ei5ejNi{h)J\fj{h)T (x) requires one to generate L(L— 1) 
additional system states (apart from the sigma points to be introduced later), 
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which would be prohibitive if the system dimension m is large and we require 
L > m to avoid rank deficiency in the sample covariance. Thus to reduce 
the computational cost, we discard these terms following [M]. Therefore, 
Eq. fl5.7bp is reduced to 

i=l 

As for the scaled unscented transform (SUT), we also need to generate a 
set of special system states {A^j^^g (L > m): 

Xq = X, 

A", =x+/i(S,),, ^ = 1,2,--- ,L, (5.9) 
X, = ^-h (S,.),_L , ^ = L + 1, L + 2, ■ ■ ■ , 2L, 

which are also called sigma points. But note that here sigma points are 
generated for the purpose of function interpolation, while in the SUT, sigma 
points are particularly chosen to capture certain moments of the distribution 
of X. 

Let the transformed sigma points be {3^j : = ^ i'^i)}fto- what fol- 
lows we introduce three different approximation schemes. 
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5.3.1.1 First order divided difference approximation 

In the first order divided difference (DDI) approximation sclieme, tlie non- 
linear function T is approximated by [611 ES] : 

= (x + (5x) 

^^(X)+P5e^(x) (5.10) 

Tlierefore tlie estimated mean i) is given by 

r} = E (x) + V,,T (x)] = ^ (x) = 3^0 . (5.11) 
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Similarly, the estimated covariance 



= E (ry - r)) (r/ - r))' 



= E 

1 

1 

~ 4^ 



1 r 1 r 1 T 



{Sey N{h)T (x) (5e)' (x 



1=1 
1 ^ 

— J] [.F (x + (S,)J -T{^-h (S,) J] [.F (x + (S,) J - ^ (x - (S, 
1=1 

L 



(5.12) 



For convenience, it is customary in practice to also compute the cross 
covariance (for evaluation of the Kalman gain in the DDFs), which is given 

by 



Pj.^ — E 



5x [r] - fjY ^ 



2h 

^ S,M{h)J- (x) 

L 



(5.13) 



i=l 



To summarize, in the DDI approximation scheme, the solution to the 
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recast problem is given by 



r/ = ^(x) = 3^o; (5.14a) 
1 ^ 

i=l 
1 ^ 

2=1 

5.3.1.2 Second order divided difference approximation 

In the second order divided difference (DD2) approximation sclieme, tlie 
nonlinear function is approximated by [Ml [65] : 



1] = {x + (5x) 

^ (5.15) 



(x) + i- (5e)^ Ar(/.).F (x) + ^ X] i6e.f ^h) (x) . 



i=l 



Thus we have the estimated mean 



1 ^ 

^ = (x) + 5^ E [i6e,f] (x) 



1=1 



^.^(x) + -^X^P.(/i).F(x) (5.16) 

i=l 



K'-L 



i=l 
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To estimate the covariance, we have 

= E (?7 - 17) (?7 



(5.17) 



To facihtate the evaluations, one may note that 

P, = E(r?-r)) (ri-fif 

= E (77 - ^ (x)) (77 - ^ (x))^ - (r? - ^ (x)) (r) - ^ (x))^ 

+ E m{h) - l)^(x)] mih) - l)^(x)]^ 

i=l 

+ E iHSe^nse,r)mih)-l)J^{R)][{v,{h)-l)J^{5.)f 

i=l j=l,jyti 

- ^ E (^(^^«)') ' [(^^(^) - 1) (^)] [(^^(^) - 1) (^)]^ 

-^E E (E(5e,)^)(E(5e,)^)[(7',(/.)-l)^(x)][(P,(/i)-l)^(x)r . 

i=l i=lj^i 

(5.18) 

Note that, to derive the above equation, we utihze the fact that the third 
order moments ^{SeiSejSek) = for arbitrary admissible indices i, j, and k, 
since we assume 5e ~ iV (5e : 0, 1). Moreover, we also have 

E{Seif{5ejf = (E(5e,)') {E{5ejf) = l,for i^j. (5.19) 
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Note that in the author chose to parameterize the term K{Sei)^ and 

set E(5ej)^ = h"^. It is in this respect that the DD2 approximation differs 
from the central (divided) difference (CD) approximation, as will be seen 
later. 

Following the choice in [BH [65] , we have 



P. = 4^(A/'W-^(x))^(Ar(/.)^(x)) 



+ 



J2mih)-i)j^i^)]mih)-i)^i^)f 



i=l 



1 ^ 



4/l2 



i=l 



i=l 



(5.20) 



Note that to guarantee the positive semi-definiteness of P^, a sufficient con- 
dition is that h > 1. 

Similarly, we have the estimated cross covariance 



(5x (?7 — 57)'' 



2h 



■E 



SJe {6ef J\f{h)J^ {5c) 



- SM{h)J^ (x) 
1 ^ 



i=l 



(5.21) 



which is the same as that of the first order approximation (cf. Eq. (15.131) ). 
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To summarize, in the DD2 approximation scheme, the solution to the 
recast problem is given by 



h'-L 



^ 2L 

1=1 

1 ^ 

i=l 

1/^ 



L 



P-'' = ^ E (Sx). (3^. - 3^L+.)^ . (5.22c) 
1=1 

5.3.1.3 Central (divided) difference approximation 

The central (divided) difference (CD) approximation scheme [37] is almost 
the same as the DD2 approximation scheme, except that it does not param- 
eterize the fourth-order moment E(5ej)^. Instead, it takes E(5ej)^ = 3, as is 
the case for the Gaussian distribution N {6e : 0,1). Thus we do not repeat 
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the derivation. Instead, we summarize the main results as follows: 



h^-L 



1 

i=l 



1 

= 4^ E - ^^+^) - ^^+^)^ 

1 ^ 

+ ^ + ^^+^ - 23^0) + ^^+^ - 23^0 



(5.23a) 
(5.23b) 



i=l 



1 ^ 

1=1 



(5.23c) 



5.3.2 Accuracy analysis 

Following Chapter HJ we conduct accuracy analyses for the divided difference 
approximation schemes. To this end, we first define perturbations {SXi}^^^ 
of sigma points around x according to Eq. ( 15.91) : 



6Xi = ( 



0, 

h (S.). , 

-his 



2 = 0; 



xJi-L ' 



i = l,--- ,L; 



i = L + 1, 



(5.24) 



2L. 
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Expanding T around x gives (cf. Eq. fl4.2ip ) 



2! 
1, 



3! 



^(x) + 8Xl\/T +2^^ 



(5.25) 



On the other hand, under the assumption that x ~ (x : x, P^.), the 
mean and covariance of the transformed random variable r/ = J^(x) are given 
by 



^(x) + i ( V^P. V) ^ + ^ E (Dt,^) + 



P^ = E 



(VJ^f P,(V^)+E 



6 



V^P.V 



6 



V^P.V 



+ 



fl4l6bl) 



5.3.2.1 Accuracy of first order approximation 



Compared with Eq. fl4.16ap . it is clear that the first order estimation of 
the mean in Eq. fl5.14al) is carried out only up to first order in the Taylor 
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series expansion, which is zero in both equations under the assumption of 
Gaussianity. 

On the other hand, note that 



y,-y,^^^2Bsx,y'+^-Dl;^^J^+--. , fori = l,.-. ,L, (5.27) 



where the even-order derivative terms vanish because of the symmetry in 
{SXij^^Q. Thus we have 

1 ^ 

= E (^^ - ^^+^) (^^ - ^^+')^ 
1=1 

1=1 ^ 

1=0 ^ ^ 

= (V^)^P. (V^) + ^ ^ I^D,^,^ (dL,^)^ + ^D^;,^^(D,^,^)4 + . 



(5.28) 

Note that in the final hne of the above equation, the terms 
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can be considered as the estimation of the terms 



E 



6 



6 



in Eq. (]4.16bll . However, an estimation of the part 



E 



V^p.v 



V^P.V 



in Eq. fl4.16bp is missing 



5.3.2.2 Accuracy of second order approximations 

In order to analyze the accuracy of the mean estimation of the second order 
approximations, one may note the equivalence between the mean estimation 
of the unscented transform (UT) and those in Eqs. f l5.22al) and f l5.23ap [64 



65] . To see this, let h = y/L + X, with A being the free parameter of the UT 
(cf. Eq. f l4.4p ). and treat the set \ — — — , ■ ■ ■ , \ as the weights of 



the propagated sigma points {3^0) 3^i) " " " > 3^2l}- Then it can be shown that 



Eqs. fl5.22ap and fl5.23ap are equivalent to Eq. fl4.7ap . Thus the accuracy 
analysis of the mean estimations of the DD2 and CD approximations just 
follows that of the UT in Eq. fOSj) . 

To analyze the accuracy of covariance estimations of the DD2 and CD ap- 
proximations, we temporally parameterize the fourth order moment E (5ei)^ = 
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(74, and let the covariance estimation be 



1 ^ 

, (5.29) 

1 

1=1 



By Taylor series expansion, one has 



yi + yL+i-2yo = -Dl;,^r+^Dt^^T+--- , for z = l,--- ,L. (5.30) 



Moreover, note that 6X0 — 0, therefore D^;^,^^ = for all j > 1. Because of 
the symmetry in 6Xi, one has 

L 2L 

i=l i=0 (5.31) 

where 

^ = E DL,-^ (DL,-^)^ - (V^P.V^) (V^P.V^)^ . (5.32) 

i=l 

Since the set of sigma points {A^I^^q in general cannot capture all of the 
fourth order moments of the random variable x, the term A may not vanish. 
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Substituting Eq. flOB and Eq. into Eq. fICTD . we have 



1 ^ 

^ 5^ (3^. - 3^L+.) (3^. - yL+^y 

i=l 



Ah* 



i=l 



Ah* 



i=l 



^ 1 ^ fD^^t-j^fD?;.;^)^ 



/l2 4 

V^P.V 



V^P.V 



6 



+ ■ ■ 



(5.33) 



where 



2/i2 ^ ) 6 /i2 4 6 

i=0 I, 



can be considered as the estimation of 



E 



6 



4 



6 



in Eq. fl4.16bll . 
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If one takes = h'^, = L + A, and Wi = 1/2(L + A) = l/2h^ for 
i = 1, ■ ■ ■ , 2L, then it can be shown that the covariance estimation of the 
DD2 approximation matches that of the UT in Eq. fl4.26l) (with /9 = 0) for 
the terms presented on the rhs of Eq. (15.331) ll- On the other hand, if one lets 
o"4 = 3, then, in general, there would be a deviation of the CD approximation 
from the UT estimation in the term Y)1^,J^ (Y)1p^_J^^'^ . 

5.4 Divided difference filters as the approxi- 
mate solutions 

Incorporating the divided difference approximations into the propagation 
step of RBE leads to the corresponding divided difference filters (DDEs). 
Without loss of generality, we assume that at time instant A; — 1, one has 
obtained the analysis sample mean x^_i and a square root ^%°Li of the error 
covariance Pfc_i- Based on these, a set of 2Lk-i + 1 {Lk-i > m) sigma points 
with respect to the triplet (/;., x^_^, S^" can be generated in the spirit of 
Eq. so that 

Xt.,, = n-1 + h (Sr_i), , 2 = 1, 2, ■ ■ ■ , Lk-i, (5.34) 
"^k-i,! = ^k-1 " h {^k-i)i_Lk-i ' ^ ^''-'^ ^''-^ + 2, ■ ■ ■ , 2Lk-i. 



^But the omitted terms on the rhs wiU not be completely the same. In fact, it can be 
verified that the covariance estimation Eq. (|5.22bp of the DD2 approximation in general 
is not equal to the covariance estimation Eq. (|4.7bp of the UT, even when f3 = 0. 
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After generating sigma points at k — 1, one propagates them forward 
through the system model. Let the ensemble of forecasts of the propagations 
be 

= {x^,, : x^,,, = Mk,k+i , i = 0, • • . , 2L,_i} . (5.35) 

Then the ensemble mean and covariance of the background can be 
estimated in a way consistent with the chosen approximation method, as 
will be shown below. For convenience, we also split the procedures of the 
DDFs into the propagation (or prediction) step and the filtering step. 
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Table 5.1: Square roots at the propagation steps of sigma point Kalman filters. 



Square Roots 



Remarks 



SUKF 



Sx 
k — 



Ko i^k i^U) - Yk) , VWi {n, (x^ J - y.) , 







DDI 


ox 


Xfc,l ^k,Lk-i+l^ ^^k,Lk-i Xfc,2Lfc_i 





DD2 



= [sf,sf] 

ca;l _ ^ 



Xfc,l Xj._^^_^_j.^, ,-^fe,Lfc_i X 



2/^2 

oh _ rc?»l C/i2l 
1 



Ai + x^ 



fc,Ljj_i+l 



fc,2Lfc_ 



^fe,2Lfe 



2x' 



fc,0 



C/ll 



CDF 



[Sf,Sf] 



2h 

Ox2 _ ^ 
oh _ \ohl c/i2] 
1 



„6 _ 



^k,L^_i+l 



' ^k,Lk-x + ^k,2Lk-x 



2/i 



[yfc,! yfe,Lfc+i' ■ ■ ■ ' yfc.Lfc yfe,2z,J 



5.4.1 Propagation step 

In the DDFs, the ensemble mean and covariance are evaluated accord- 
ing to the chosen approximation method. Specifically, 

- For the DDI filter 



xt =xt,o, (5.36a) 

b 



^'^ Ah? 



i=l 



For the DD2 filter 



— Tn, x^,o + ^ XI (5.37a) 



/i2 2/i2 

1=1 



p 



b 



^k,Lk-i+i) ^k,Lk-i+i) (5.37b) 



1=1 



i=l 

For the central difi^erence filter (CDF) 



1,2 r - ^^'^ 

. _ /l -^k-i b , 



■^fe ^4/^2 E (-^^.^ ~ ^k,Lk-i+i) {^k,i - ^k,Lk-i+i) (5.38b) 
i=l 
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For each DDF one can also re- write its background covariance in terms 
of some square roots, so that 

Pl = St{Stf = SUSlf + Qk, (5.39) 

where the square root of each DDF is hsted in Table 15.11 for convenience 
(also with the square roots of the SUKF listed there for comparison). To 
compute the square root S^*, one can let S^^ = ^JsliSff^~+~Qk , following 
the numerical scheme in § 12.3. 1[ 

If the observation operator Ti^ is nonlinear, then a divided difference 
approximation has to be conducted on Tik once again. This is because the 
background ensemble in Eq. fl5.35p is in general no longer symmetric 
about the sample mean as are sigma points {^k^i^i} in the previous 
assimilation cycle. Thus one has to regenerate sigma points with respect to 
the triplet x^, S^'') in order to conduct a divided difference approximation 
to estimate the cross and projection covariances of the background ensemble 
(cf. Eq. dEH below) 

Let Lfc be the number of the column vectors of S^''. Then one gener- 
ates another set of sigma points, Xj! = {Xl!^, ■ ■ ■ , ^1 2Lfc}' with respect to 

^In contrast, the SUKF does not regenerate sigma points at the propagation step 
because it is based on statistical approximation. Each member of the background ensemble 
is already assigned a weight for the purpose of approximation when sigma points are 
generated in the previous cycle. Thus it is not necessary to require that the ensemble 
members of be symmetric about x^.. 
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(/i,x^,Sf) as follows: 

■yb _ 

A^,^ = xt + /^(Sf)^,^ = l,■■■ (5.40) 
Xl, = yi-h (Sf )^_^^ , ^ = Lfc + 1, ■ ■ ■ , 2L,. 

Similarly, we can define a set of forecasts of the projections of the above 
sigma points 

n = {yU ■■ yU = -Hk (4.) , z = 0, ■ ■ ■ , 2L,} . (5.41) 

Then the cross and projection covariances of all the DDFs, in terms of square 
roots, can be computed as follows: 

■per cxb ( chl\^ 

— '^fc \^k ) 5 

where 

Sfc^ = ^ \y\,i ~ yfc,Lfc+i' ■ ■ ■ ' yfc.Lfc ~" yfe,2Lft] (5.43) 

is the same for all the DDFs, but has different forms, which are again 
summarized in Table ISTTl 



(5.42a) 
(5.42b) 
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Finally, the Kalman gain is given by 

^ ' 1 (5.44) 

5.4.2 Filtering step 

At the filtering step, the procedures of the DDFs are the same as those of the 
SUKF. One first computes the updated sample mean and covariance through 
the following formulae 

= xt + K, (y, - (xt) ) , dMZID 
Vl = V\-Y.^{^i^ . (j437bD 

By adopting a certain algorithm to compute a square root S^'" of P^, one 
generates a new set of sigma points, now denoted by = {A'^q' '^fc i; ' ' ' }' 
in the spirit of Eq. fl5.34p . Propagating these sigma points forward, one starts 
a new assimilation cycle at instant A; + 1. 

5.4.3 Summary of the divided difference filters 

We summarize the main procedures of the DDFs as follows: 
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Propagation step: 





- {^k,i 




(5.46a) 




ox fyb 
^k^ ^k 


= ddf (/I, Q.) , 


(5.46b) 


~ 


= ^Sl {Slf + Qk, 


(5.46c) 


^k - 


= {^k,i 


^k,i — ^ (^) ^fc^) }i=o ' 


(5.46d) 


Y^ 


- {yU 


yk,i = {^k,i}Yitl ' 


(5.46e) 




,sa = 


ddf(/i,YtRfc) , 


(5.46f) 


■per _ 
^k - 


cxb 1 chl^^ 
- "^k \pk ) 1 


(5.46g) 


^k 


= Si (st)^ , 


(5.46h) 




= Sf (S^^)^(st(S^r + R.)", 


(5.46i) 



where Eqs. fl5.46b[l and flK46f|l mean that x^, S"^^ -^^5 and are com- 
puted according to the divided difference approximation scheme in use, while 
Eq. fl5.46dl) means that the sigma points are generated with respect to the 
triplet (/i,xtSf). 



Filtering step: 



x^ = xt + K, {yt-n, (xy) 



k-^k[P,J 



Sxa 
k 



^k ■ 



(5.47a) 
(5.47b) 
(5.47c) 
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Analysis scheme: 
Sigma points: 



= {-^M : - ^ {h, SD , ^ = 1, 2, • • • } . 



(5.48) 



5.5 Reduced rank divided difference filters 
for high dimensional systems 

For the DDFs, the modification scheme is similar to that of the SUKF. The 
difference for the DDFs lies in that, one has to generate sigma points twice, 
rather than only once as in the SUKF. Moreover, as to be shown below, 
in a DDF, a truncated singular value decomposition (SVD) is conducted 
at the propagation step for the generation of sigma points, rather than at 
the filtering step as in the SUKF. For convenience of discussion, we also 
assume that at time instant k — 1, one has obtained a set of sigma points 
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5.5.1 Propagation step 

We define a set of forecasts of the propagated sigma points by 

= {xt, : 4, = MkMi i^k-i,) , ^ = 0, ■ ■ ■ , 24_i} , (5.49) 

based upon which the ensemble mean and covariance can be com- 
puted accordingly, depending on which approximation scheme is chosen (cf. 

Let be decomposed as 

Pl = ElBl{Eif, (5.50) 

where = diag(cr^ d ' " " ^^km) a diagonal matrix of eigenvalues ^ of 
P^, and = [e^^i, ■ ■ ■ , 6^,™] is the matrix consisting of the corresponding 
eigenvectors j. Then, a new set of 21^ + 1 sigma points, denoted by A"! = 
{'^fc,0' ■ ■ ■ ; '^yk,2ifc}; '^^^ ^e generated as follows: 

■yb _ 

= ±l + hak,^ek,^, i = 1, ■ ■ ■ , /fc, (5.51) 
'^k,i = ^fe ~ hak,i~i^e:k,i~ik-, = /fc + 1, ■ ■ ■ , 
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where Ik is an integer satisfying 



<i > trace (Pt) /r,, z 
a^,, < trace (P^) /r^, t 



>lk + l 



(5.52) 



with Fk being a pre-specified threshold (the values of are chosen in the 
same way as in § I4.7.2p . Moreover, we also specify upper and lower bounds, 
lu and li respectively, to prevent Ik from getting too large or too small. 

By projecting the sigma points in Eq. f l5.5ip onto the observation space, 
we have the forecasts of the projections 



Based on sigma points and their projection forecasts, we can obtain some 
approximate square roots. Specifically, we take S^* = [<Jk,iek^i, ■ ■ ■ , o-k^ii^ek^i^] 
as an approximate square root of P^. The approximate square roots to 
S^^ and S^, denoted by S^^ and respectively, are computed according to 
the formulae in Table 15.11 but with L^-i and therein replaced by Ik-i 
and Ik, respectively. The corresponding approximate cross and projection 
covariances are given by 







■■■ ,2lk}. 



(5.53) 




(5.54b) 



(5.54a) 
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Consequently, the Kalman gain is approximated by 



Sffs'i'Y (St (st] +R 



k y^k J I ^fe ^^fc 



(5.55) 



5.5.2 Filtering step 

When a new observation is available, the ensemble mean is updated as 
follows: 

x^ = xt + K,(y,-7^,(xt)). (5.56) 

To obtain a square root of the updated covariance , in principle, one 
may compute the covariance first, and then perform a matrix factorization 
through a certain numerical algorithm. To reduce the computational cost, 
however, we follow the idea in the ensemble square root filter (EnSRF) [6l 
[T3| l80l [89] to update S^'' to S^" directly. For example, using the ensemble 
transform Kalman filter (ETKF) [13], one may update the square root S^** 
via 

Sr = Sf TfcUfc, (5.57) 

where Ufc is the centering matrix in Eq. (13.341) . and is the transformation 
matrix given by (cf. § 13.3.21) : 

Tfc = Er'^(Dr^ + I)'^/% (5.58) 
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with I being the identity matrix. E^^*" and D^^'^ are the eigenvector matrix 
and the corresponding diagonal matrix of eigenvalues, respectively, of the 
weighted projection matrix P^^'', which is defined by 

= (^S^i)^R,iSf = Eror'- (E^'^)^ . (5.59) 

Note that, by using the square root Eq. (15.591) is equivalent to the original 
form in [13] if the observation operator Tik is linear, but it avoids evaluating 
the Jacobian matrix of Tik when Tik is nonlinear. 

After obtaining and S^*^, one produces 2/^ + 1 sigma points with respect 
to (^h, x^, S^" j and then propagates them forward to start a new assimilation 
cycle. 

5.6 Example: Assimilating the 40-dimensional 
Lorenz-Emanuel 98 system 

5.6.1 The testbed and the measures of filter perfor- 
mance 

The testbed and the measures of filter performance are the same as those in 
§ 14.7.11 The dynamical system (LE 98 model) is governed by 

dx ' ^^^^^ 

= {xi+i - Xi_2) Xi.i - + 8, i = 1, ■ ■ ■ , 40 , ^Mi 
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while the observation system is 



Yfc = Xfc + Vfc , (|3.45p 

where follows the Gaussian distribution (v^ : 0,1). 

We integrate the dynamical system Eq. (13.441) by a fourth-order Runge- 
Kutta method [HU Ch. 16], and choose the length of the integration window 
to be : 0.05 : 100. We make the observations at every integration step. 

The measures of filter performance are the time averaged relative rmse 
and rms ratio, given by 



-1 ^max 



and 



k ^-^ 2/fc+l 

rvm n/r . . JU. . , „ . 

\ , , 

k Il2 



i=l 

respectively. Again, according to Eq. (13.511) . the expectation of the rms ratio 



^(/e// + l)/(2/e// + l) 



with leff being the "effective" truncation number over the whole assimilation 
window. Re is about 0.71 for a large leff- 
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5.6.2 Numerical results 

5.6.2.1 Effects of the inflation factor S and the length scale 1^ on 
the performances of the reduced rank DDFs 

We also adopt the covariance inflation and filtering techniques in our exper- 
iments to improve the performances of the reduced rank DDFs. To examine 
the effects of the inflation factor 6 and the length scale Ic on the perfor- 
mances of the filters, we let 6 and Ic take values from the sets : 0.5 : 10 and 
10 : 20 : 400, respectively. The values of the other parameters are: interval 
length h = 3, lower bound k = 3, upper bound /„ = 6, and the threshold at 
the first assimilation cycle Fi = 1000. 

We choose an initial condition at random to start a control run, and thus 
obtain a trajectory of the true states within the specified assimilation window. 
We then add some Gaussian noise drawn from the distribution (v^ : 0, 1) 
to the true trajectory to generate the observations. The noise level (relative 
rmse) 6°^"" ~ 0.22. We also follow the procedure in § l4.7.3.1l to initialize the 
reduced rank DDFs by generating 6 randomly perturbed initial conditions 
as the background ensemble at the first assimilation cycle. Then we use the 
ensemble transform Kalman filter (ETKF) to update the background to the 
analysis, and so generate sigma points. After propagating the sigma points 
forward, the DDFs can start running recursively from the second cycle. 

In our experiments, we first examine the performances of the DDFs in 
terms of the relative rms errors. To this end, we plot the relative rms errors 
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01 23456789 10 

Inflation factors 

Figure 5.1: The relative raise of the DDI filter as a function of the inflation 
factor 6 and the length scale le- 
af the DDI, DD2 filters and the CDF as functions of 5 and Ic in Figs. 15. H 
15.21 and 15.31 respectively. In all these figures, when fixing 6, if 6 is relatively 
small (e.g., 6 = 2 for the DDI filter), the relative rms errors are roughly 
monotonically increasing functions of Ic- If S is relatively large (e.g., 6 = 5 
for the DDI filter), then the relative rms errors of all three DDFs exhibit the 
U-turn behaviour as Ic increases. On the other hand, when fixing 1^, if Ic is 
relatively large (e.g., Ic = 250 for the DDI filter), then the relative rms errors 
are roughly monotonically decreasing functions of 6. If Ic is relatively small 
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Figure 5.2: The relative rmse of the DD2 fiher as a function of the inflation 
factor 6 and the length scale Ic- 

(e.g., Ic = 100 for the DDI filter), however, then the relative rms errors of all 
three DDFs also exhibit the U-turn behaviour as 6 increases. 

A comparison between the DD2 filter and the CDF shows that these two 
filters have similar performances, with the DD2 filter being slightly better 
than the CDF for some parameter values (e.g., 6 = 8 and Ic = 200). A 
comparison between the DDI and DD2 filters with the same Ic indicates 
that, when S is relatively small (say 6 = 1), the DD2 filter outperforms the 
DDI filter, as one might expect. However, if 6 is relatively large (say 6 = 8), 
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Figure 5.3: The relative rmse of the CDF as a function of the inflation factor 
b and the length scale Ic- 

the DDI filter may instead outperform the DD2 filter. In fact, within the 
ranges of the parameters we have tested, the DD2 filter and the CDF are 
always divergent, in the sense that their relative rms errors are always larger 
than the noise level ~ 0.22. In contrast, in some regions of Fig. 15. ![ 
e.g., the area surrounded by the contour level curve marked by the value of 
0.2, the axis on the right (corresponding to 5 = 10) and the axis on the top 
(corresponding to Ic = 390, not marked in the figure), the relative rmse of 
the DDI filter is actually less than ef''", and thus non- diver gent. 
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Figure 5.4: The relative rms errors of the DDFs as functions of the inflation 
factor 6 when there is no covariance filtering. Here we consider two scenarios: 
lower bound k = 3, upper bound = 6 and lower bound k = 10, upper bound 
lu = 13, with other parameters following the same setting at the beginning 
of ^ 15.6.2.11 

Our explanation for the above counter-intuitive phenomenon is that it is 
the introduction of the covariance filtering technique that makes the DDI 
filter outperform its second order counterparts. As shown in Fig. 15.41 when 
there is no covariance filtering, the DD2 filter and the CDF always outperform 
the DDI filter, given the same parameter settings (either li = 3 and = 6, or 
li = 10 and lu = 13). Moreover, by comparing Fig. 15.41 with Fig. 14.31 one can 
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see that, where there is no covariance filtering, the SUKF always outperforms 
the DDFs within the range of 6 tested. The DD2 filter and the CDF are 
comparable with the ETKF in general, while the DDI filter underperforms 
the ETKF. In this sense, introducing covariance filtering to a filter might 
significantly influence its performance. Here it substantially improved the 
performance of the DDI filter in some parameter regions (cf. Figs. 15.41 and 
15.11) . However, from the theoretical point of view, in the literature there 
is still no in-depth understanding of how the covariance filtering technique 
affects the performance of a filter. 

Next we examine the rms ratios of the DDFs. We plot the rms ratios of 
the DDI, DD2 filters and the CDF as functions of 6 and Ic in Figs. 15.51 15.61 
and 15.71 respectively. In all these figures, when fixing 6, for relatively small 
6 (say 6 = 2 for the DDI filter), the rms ratios of the DDFs appear to be 
a monotonically increasing function of Ic- If S is relatively large (say 6 = 5 
for the DDI filter), then the rms ratios also exhibit the ?7-turn behaviour as 
Ic increases. On the other hand, when fixing Ic, for relatively large Ic (say 
Ic = 150 for the DDI filter), the rms ratios of the DDFs tend to decrease as 
6 increases. If Ic is relatively small (say Ic = 80 for the DDI filter), the rms 
ratios also exhibit the ?7-turn behaviour as 6 increases. 

To make sigma points indistinguishable from the truth (i.e., R ~ 0.71), 
one should take the parameter values of 6 and Ic within the strip between 
the contour levels of 0.7 and 0.8. However, overestimation of the analysis 
covariance (e.g. R < 0.71) will improve the performances of all the DDFs in 
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Figure 5.5: The rms ratio of the DDI filter as a function of the inflation 
factor 6 and the length scale Ic- 

the sense that they can achieve lower relative rms errors. This is consistent 
with the situations in the EnKF and the reduced rank SUKF, as we have 
shown in Chapters [3] and H] respectively. 

5.6.2.2 Effect of the interval length h on the performances of the 
reduced rank DDFs 

For all the DDFs, we set the threshold Fi = 1000, the lower bound li = 3, 
the upper bound = 6, the length scale of covariance filtering 1^ = 240, the 
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Figure 5.6: The rms ratio of the DD2 filter as a function of the inflation 
factor S and the length scale Ic- 

covariance inflation factor 6 and the interval length h take values from the 
set : 0.5 : 10 and 1 : 0.5 : 5, respectively. 

First we examine the effect of h on the relative rms errors. As shown in 
Figs. 15.81 15.91 and l5. 101 when fixing 6, for relatively small 6 (say 5 < 6 for the 
DDI filter), the rms errors of the DDFs appear insensitive to the change of 
h. But for relatively large 6 (say 5 > 7 for the DD2 filter), the rms errors of 
the DDFs tend to decrease monotonically as h increases. When fixing h, the 
rms errors of the DDFs either decrease monotonically or exhibit the U-turn 
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Figure 5.7: The rms ratio of the CDF as a function of the inflation factor 5 
and the length scale Ic- 

behaviour as 6 increases, similar to what we have already seen in § 15.6.2. 1[ 

Next we examine the effect of h on the rms ratios. As shown in Figs. 15. IH 
15.121 and I5.13[ the rms ratios of the DDFs are all monotonically decreasing 
functions of h. To make sigma points indistinguishable from the truth, one 
should take values of h and 6 that make the rms ratios close to 0.71. However, 
overestimation of the analysis covariance (e.g. R < 0.71) can also improve the 
performances of the DDFs in the sense that they can achieve lower relative 
rms errors. 
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Figure 5.8: The relative rmse of the DDI fiher as a function of the inflation 
factor S and the interval length h. 

5.6.2.3 Effects of the threshold Fi and the upper bound /„ on the 
performances of the reduced rank DDFs 

Now we examine the effects of the threshold Fi and the upper bound on 
the performances of the DDFs. Like the experiments for the reduced rank 
SUKF, here we only examine the effects of these parameters on the relative 
rms errors of the DDFs. 

In the first experiment, we let the covariance inflation factor 6 = 0, the 
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Figure 5.9: The relative rmse of the DD2 fiher as a function of the inflation 
factor S and the interval length h. 

length scale Ic = 240, the initial ensemble size n = 6, the interval length 
h = 3, the lower bound = 3 and the upper bound = 6. We vary the 
threshold Fi such that log^gFi takes values from the set 2 : 0.5 : 5.5. 

We show the numerical results in Fig. 15.141 As for the case of the reduced 
rank SUKF, a larger threshold Fi in the DDFs does not necessarily lead to 
lower rms errors. For example, in the DD2 filter, the optimal Fi is Fi = 10^, 
rather than 10^. A possible explanation for this phenomenon was given in 
^ 14.7.3.31 
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Covariance inflation factor 6 

Figure 5.10: The relative rmse of the CDF as a function of the inflation 
factor 6 and the interval length h. 

In the second experiment, we take 6 = 6, Ic = oo (i.e., no covariance 
filtering), n = 10, Fi = 1000, and h = 3. We fix the lower bound = 3, but 
take the values of the upper bound Z„ from the set 6 : 1 : 40. 

In Fig. 15.151 we plot the relative rms errors of the DDFs as functions of 
the upper bound /„. For the DDI filter, the relative rmse enters a plateau for 
> 9, so that choosing > 9 does not significantly improve the performance 
of the DDI. For the DD2 filter and the CDF, however, their relative rms 
errors appear to be monotonically decreasing until reaches 35, after which 
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Figure 5.11: The rms ratio of the DDI fiher as a function of the inflation 
factor S and the interval length h. 

the relative rms errors do not change significantly as lu further increases. 



5.7 Summary of the chapter 

In this chapter we introduced the idea of using Stirling's interpolation formula 
to solve the recast problem in Fig. 14. H and presented three divided difference 
approximation schemes. We conducted accuracy analyses for these three 
schemes through Taylor series expansions. Analytical results showed that 
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Covariance inflation factor 6 

Figure 5.12: The rms ratio of the DD2 fiher as a function of the inflation 
factor S and the interval length h. 

the first order divided difference (DDI) approximation is less accurate than 
the second order and central (divided) difference approximation (DD2 and 
CD) schemes, while the DD2 and CD approximation schemes themselves 
are comparable with the specific scaled unscented transform (SUT) with the 
scale factor a = 1 and the compensation parameter /5 = (cf. Chapter Hj). 

Incorporating the above approximation schemes into the propagation step 
of recursive Bayesian estimation (RBE) leads to the corresponding divided 
difference filters (DDFs). To reduce the computational cost of the DDEs 
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Figure 5.13: The rms ratio of the CDF as a function of the inflation factor 
6 and the length scale Ic- 

in high dimensional systems, we also introduced the reduced rank DDFs 
following the idea in § 14.61 

For illustration, we used the 40-dimensional LE 98 system as the testbed 
to demonstrate the details in implementing the DDFs. We also investigated 
the effects of filter parameters (e.g., h, Fi etc) on the performances of the 
DDFs. Numerical results showed that introducing the covariance filtering 
technique may lead to counter-intuitive results: the DDI filter outperformed 
both the DD2 filter and the CDF in some situations. However, when there 
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Figure 5.14: Relative rms errors of the DDFs as functions of the threshold 



was no covariance filtering, numerical results were still consistent with the 
results of accuracy analyses in § I5.3.2I 
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Figure 5.15: Relative rms errors of the DDFs as functions of the upper 
bound L. 
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Chapter 6 

Gaussian sum filters for data 
assimilation 

6.1 Overview 

In the previous chapters we considered the data assimilation problem in non- 
linear/Gaussian systems. In practice, however, the Gaussianity assumption 
in the previous chapters is often not realistic. Instead, the dynamical and 
observation noise, and the states of the system under assimilation are non- 
Gaussian more often than not. For this reason, in this chapter we consider 
the data assimilation problem in nonlinear/non-Gaussian systems. We show 
that one may approximately solve such a problem by using the nonlinear 
Kalman filters established in the previous chapters. 

The main idea in this chapter is to approximate the prior and posterior 
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probability density functions (pdfs) of system states, and the pdfs of dynam- 
ical and observation noise in recursive Bayesian estimation (RBE) Eq. (11.31) 
via some Gaussian distributions. This is known as the Gaussian sum approx- 
imation, or the Gaussian mixture model (GMM) in the literature [3l ITU [77]. 
In the extreme situation, by letting the covariance matrix of a Gaussian dis- 
tribution tend to zero, the Gaussian distribution approaches a Dirac delta 
function defined in Eq. (13. 3p . For this reason, when all of the covariance 
matrices of the Gaussian distributions in a GMM tend to zero, the Gaussian 
sum approximation approaches a Monte Carlo approximation. 

By adopting the GMM, one can approximately decompose a nonlinear/non- 
Gaussian system into a mixture of a set of sub-systems, each of which takes 
the form of a nonlinear /Gaussian system. Thus, for each sub-system, one 
can apply the nonlinear Kalman filters introduced in the previous chapters 
for data assimilation. Incorporating the estimations of the sub-systems into 
the GMM gives an approximate explicit form for the pdf. This is normally 
regarded as a "complete" solution to the data assimilation problem, since all 
of the statistical information of interest can be obtained from the explicit 
form [9]. 

The remainder of this chapter is organized as follows. In § 16.21 we state 
the problem of interest, which differs from those in the previous chapters 
in that the systems considered in this chapter are possibly nonlinear/non- 
Gaussian. By incorporating the idea of GMM for pdf approximation into the 
framework of RBE, in § 16. 31 we derive a "new" filter (relative to the nonlinear 
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Kalman filters introduced previously), called the Gaussian sum filter (GSF), 
as the approximate solution to the data assimilation problem in § 16.21 We 
also propose an auxiliary technique to conduct pdf re-approximations, which 
aims to reduce the computational cost of the GSF in some situations and 
increase its numerical stability. For illustration, in § 16. 5[ we use the Lorenz- 
Emanuel 98 system as the testbed and examine the performances of some 
GSFs. Finally, we draw our conclusions for this chapter in § 16.61 



where the transition operator M.k,k-i and the observation operator Hk are 
both possibly nonlinear. The dynamical and observation noise, and 
respectively, are non-Gaussian, but their approximated pdfs are assumed to 
be known to us, in terms of the following GMMs: 



6.2 Problem statement 



We consider data assimilation in the following systems: 




(6.1a) 




(6.1b) 




(6.2a) 



n 



■k 




(6.2b) 
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where the notation A^(x : /i, S) means that the pdf of a random variable x 
follows a Gaussian distribution with mean fi and covariance S, and a^^ G 
[0, 1] is the weight associated with N{uk : 0, Qk,i), which shall satisfy ^ e 
[0, 1] and X]i=i ^k,i — 1- The weights ^ are defined similarly. 

6.3 Gaussian sum filter as the approximate 
solution 

To solve the data assimilation problem for Eq. (16.11) . one can approximate 
the pdf of the system states through a Gaussian sum approximation, and 
then substitute the approximated pdf into the framework of RBE Eq. (11. 3p . 
The assimilation algorithm obtained in this way is known as the Gaussian 
sum filter in the literature [U [77] . 

Concretely, let the prior pdf of the initial condition xq be p(xo) = p(xo| Y„i) 
(Y_i can be treated as an empty set if no observation is available before the 
assimilation), which can be approximated by a set of ng'' Gaussian distribu- 
tions, so that 

rag'' 

p(xo) ^ 7o,^iV(xo : 4„ , (6.3) 

i=l 

where 7o,i G [0, 1] and J2i=i 7o,i = 1- Then by applying the rules of RBE in 
Eqs. (ll.Sap and (11.3bp . one can recursively compute the prior and posterior 
pdfs of the state x^, in terms of p(xfc|Yfc_i) and p(xfc|Yfc) respectively. Con- 
sequently, we can also divide the procedures of the GSF into the propagation 
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and filtering steps. 



6.3.1 Propagation step 

Without lost of generality, we assume that at instant k — 1, we have the 
posterior pdf p(xfe_i|Yfc_i) of the system states, which is approximated by 
^fc" 1 Gaussian distributions, so that 

xa 

p(xfc_i|Y,_i) ^ ■ ^Li.' PLi.) ' (6-4) 

1=1 

where jSk^i^i E [0, 1] and J2i=i^ Pk-i,i = 1- Moreover, by Eqs. fl6.1al) and 

p(xfc|xfc_i) ^ al^N{yik ■ Mk-i,k{^k-i), Qfc,0 • (6.5) 

i=l 

Then, according to Eq. (11.3b[) . the prior pdf p(xfc|Yfc_i) is given by 
p(x.|Y._,) = /p(x.|x._Op(x.-.|Y._,)rfx.-, 

(6.6) 



'-k-i 'i-k 



2^ 2^<i/?fc-i,ii.j(xfc) 
i=i j=i 



where 



Ijj(xfc) = J A^(xa. : Mk-i,ki^k-i),Qk-i,j)N{:sik-i ■ Xfe_i^i, Pfc_i^i)c?Xfc_i. 

(6.7) 

The evaluation of lij(xfc) can be treated as a nonlinear/ Gaussian estima- 
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tion problem, which has been discussed in the previous chapters. Conse- 
quently, the previously introduced ensemble or sigma point Kalman filter 
can be applied to approximate Ijj(xfe) as a Gaussian distribution N{xk : 
^- .A, Co -.0, where xt and P? /, are the mean and covariance of the 
background. These are evaluated by propagating forward the analysis en- 
semble or sigma points at instant k — 1, with mean covariance 
Pk-i,v through the following nonhnear/Gaussian system: 



Xfc+l — A^fc,A:+l(Xfc) + Ufcj , 

p{uk,j) = N{uk,j : 0, Qk,j) . 
Therefore, as an approximation we can re- write p(xfe|Yfe_i) as 

p(x.|Y,_0 5] 5^a^,/,_i,iV(x, : Pl(..)) 

nf 
s=l 



(6.8) 



(6.9) 



where n^^ = n%°i^n^, ■jk,s = (^k,j(^k-i,i with the integer index s being a one- 
dimensional representation of the index e.g., s — i + n^tilj — 1), 
1 <i < and 1 < j < n^. 
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6.3.2 Filtering step 

After a new observation is available, one can update the prior pdf p(xfe|Yfc_i^ 
to the posterior p(xfc|Yfc), according to Bayes' rule Eq. (II .Sap . Also note that, 
by Eqs. flGlB and flCTj) . 

p{ykW) ~ <'-^^(yk ■■ Hki^k), Rk,i) . (6.10) 

1=1 

Substituting Eqs. (16. 9p and (I6.10p into Eq. fll.3ap . one has 

p(xfc|Yfe) oc j9(yfc|xfc)j9(xfc|Yfc_i) 

1=1 j=l ^ ■ ^ 

i=i j=i 

where in the first line of Eq. (16.111) . "oc" means "proportional to" (by dis- 
carding the constant / p{yk\^k)p{'^k\^k-i)d^k in Eq. fll.3ap ). P^i in the 
third line is the projection covariance of the Gaussian random variable with 
mean x^ ■ and covariance j. This can be computed in the context of either 
the ensemble Kalman filter or the sigma point Kalman filter as introduced 
in the previous chapters. Finally, 

, , , A^(x,:x^,„PyA^(y,:7^,(x,),R,J 



Ar(y,:7^,(xy,P^:, + R 



k,jj 
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Similar to the situation in § 13.2.21 (in particular, Eq. fl3.15l) ). Eq. fl6.12p can 
be interpreted from the following point of view: one has a prior pdf N{'Xk : 
j, i) of ^A:, and a new observation y^. is obtained through the following 
observation system 



yfc = 7Yfc(xfc) + Vfcj , 

(6.13) 

P(vfcj) = iV(vfcj : 0, Rfcj) . 



According to Bayes' rule, Jjj(xfc) is then the posterior pdf of x^ with the 
observation y^ made by the observation system Eq. (16.131) . 

Consequently, Jjj(xfc) can be approximated by a Gaussian pdf N{'Xk : 
Xfc (j J), Pfc (j J)), with mean x^^^.^^.^ and covariance Pfc^(jj) computed by 

= x^, + K,,(,,,)(y, - Hki^yi (6.14a) 

n,i^,J)=K-^k,i^,){PT,^f. (6.14b) 



Here 



while P^^'j and P^^- are the cross and projection covariances of the background 
ensemble of the Gaussian distribution A^(xfc : x^j,P^j). 

Analogous to Eq. (16.91) . we let n^" = nf'nl, s = i + nf'{j — 1) (1 <i < nf' 
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and 1 < j < n^), and 

(3k,s = Pk,i^,3) = —Tt — -T^ ■ 7 — : • (6.16) 

EZi Ejii lk,al,N{y, : H.(x^,,), P^y, + 

Then we have 

p(x,|Y,) ^ ^/5,,.iV(x, : x^,„Py . (6.17) 

6.3.3 Statistics estimation based on the posterior pdf 

The posterior pdf p(xfc|Yfc), given in Eq. (16.171) . embodies all of the necessary 
statistical information. In particular, one may be interested in estimating 
the conditional mean x^ = E(xfc|Yfc) and the conditional covariance P^ = 
Cov(xfc|Yfc), which are given by [H ch. 8] 

x^ = 5Z/5mXm, (6.18a) 

S = l 

Pt = E/5.,s(PL + iKs - n)iKs - ^tV) ■ (6-18b) 

s=l 

Note that the above computations can be done in parallel. For example, 
one may use n^" independent processor units, each of which carries out a 
nonlinear Kalman filter algorithm to assimilate a sub-system described by 
Eqs. (16. 8p and (16.131) . The final results are simply the weighted averages of 
the outputs of the individual processors. 
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6.4 An auxiliary algorithm to reduce poten- 
tial computational cost 

For convenience, we call the nonlinear Kalman filter adopted in a GSF the 
"base filter" of the GSF. This can be any filter introduced in the previous 
chapters, e.g., the ensemble Kalman filter (EnKF) and the reduced rank 
sigma point Kalman filter (SPKF). 

One potential problem of the GSF is that, the number of Gaussian dis- 
tributions in the GMM may grow very rapidly in certain circumstances. To 
see this, let the number of Gaussian distributions used to approximate the 
distributions of the background, the analysis, the dynamical noise and the 
observation noise at time k be n^'^, n^", and n^, respectively. In the 
previous section we have shown that 



xb „xa „n 

(6.19) 



^fc ~ ^k-l^k ' 



xa _ ^xb V 
"'k — "'k "'k 



Therefore, if > 1 or nl > 1 at all times, and nl"" will grow exponentially 
with time, which will substantially increase the computational cost of the 
GSF. 

To reduce the computational cost, the authors in [3l [77] suggested that 
"it is possible to combine many terms into a single term without seriously 
affecting the approximation". In addition, some weights in the Gaussian sum 
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approximation, i.e., some 7fc,s's in Eq. f l6.9p and some /3fc,s's in Eq. fl6.17p . 
may be sufficiently small compared to the others so that they can be simply 
neglected P 177]. 

Another possible strategy is to conduct pdf re-approximations: at each 
assimilation cycle one uses a new GMM, with the specified number of Gaus- 
sian distributions, to approximate the prior or the posterior pdf that itself 
is expressed in terms of a GMM. For example, see [71]. To estimate the 
parameters of the new GMM (i.e., the weights, the means and covariances 
of individual Gaussian distributions), the author in [71] suggested an adop- 
tion of the expectation-maximization (EM) algorithm. However, the EM 
algorithm is an iterative method, which may require many iterations for con- 
vergence. Thus, using the EM algorithm in high-dimensional systems might 
be computationally intensive. 

In this dissertation, we propose another method for the purpose of re- 
ducing the computational cost, which is also based on the idea of pdf re- 
approximation. Our criterion is that the mean and covariance of a new 
GMM match those of the original one. The benefit of this re-approximation 
scheme is that, if one chooses a reduced rank SPKF as the base filter and 
implements the GSF in parallel, then in principle the computational speed 
of the GSF can be almost the same as that of the reduced rank SPKF. 

For illustration, let p(x) be the pdf of a random variable x, which is 
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expressed in terms of a GMM with n Gaussian distributions, i.e., 

n 

p(x) = ^aiiV(x:/ii,Si), (6.20) 



i=l 



where a, is the weight associated with the Gaussian distribution A^(x : /Xj, Sj) 
with mean /ij and covariance S,. Our objective is to approximate p(x) by 
another GMM p(x) with m Gaussian distributions {m < n): 



m—l 



p(x) = ^6,iV(x: Zi,Ai), (6.21) 

i=0 

where bi is the weight associated with the distribution A^(x : Aj). We 
want to choose proper values of bi, Zi and Aj so that the mean x and 
covariance P of p(x) match the mean x and covariance P of p(x), respectively. 
From Eq. fl^TTS]) . 



n 

X = 

i=\ 
n 



fli/ij , (6.22a) 

i=\ 
n 

P = ^ai(Si + (/ii-x)(/ii-x)^), (6.22b) 
while the mean x and covariance P of p(x) are given by 

m— 1 

i = ^ b^Zi , (6.23a) 

i=0 
m— 1 

P = ^ 6,(A, + (Z, - i)(Z, - ^f ) . (6.23b) 



s=0 
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For our purpose, we need to choose proper 6j, Zi and Aj so that x = x 
and P = P. To this end, we employ the idea of the unscented transform to 
generate a set of sigma points to capture the specified mean and covariance. 
Concretely, let 

S = [si,S2,--- ,Sp] (6.24) 

be a square root matrix (with p column vectors Sj, % = I,-- - ,p) of P, 
which can be obtained through some numerical decomposition algorithm 
(e.g., SVD). We generate a set of 2g + 1 sigma points Q with respect 
to the triplet (?7,x. Si), so that 

Zq = X, 

= x + Cv/^^Si, i = l,2,---,g, (6.25) 
Zi = 5c-c y/q + r] Sj, z = g + 1, g + 2, ■ ■ ■ , 2g, 

where c Sj is the i-th column of the square root matrix 

Si = c[si,S2,--- ,Sg] (6.26) 

for < c < 1 (g < p) y. Accordingly, we let {&i}^lo weights associated 

^This means that the number m = 2q + 1 oi sigma points, hence the number of Gaus- 
sian distributions, is always odd. If one wants to let m be even, one can use the set of 
sigma points {Zi}^^^ (by excluding Zq) in Eq. (|6.25p and the associated weights {foij^Ii 
Eq. (j6.27p for the pdf re-approximation. It can be shown that the sample mean and covari- 
ance of {Zi}^^j^, with {bi}^^^ being the weights, also capture the mean x and covariance 
P, respectively [40] . 

^For high dimensional systems like a weather forecast model, p may be in the order of 
10^ or even higher. Thus for computational efficiency, q < p is a reasonable choice. This 
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with the set of sigma points {2i}.^Q so that 

bo = — — , 

^7 (6.27) 
^^=2(^'^ = ''''---''^' 

where 77 is a free parameter analogous to A in the unscented transform (cf. 
§ 14.31) . In particular, rj = 1/2 means that bo = bi for i = 1,2, ■■ ■ ,2q, so 
that all Gaussian distributions in p(x) are equally weighted. According to 
Eq. gl]) in § 14X11 we have 

2q 

± = ^biZi = x, (6.28a) 

i=0 

2q ^ q 

HZi - i)(2. - = Si(Si)^ = J] s,(sO^ . (6.28b) 

1=0 i=l 

For simplicity, we let the covariances Aj of all the Gaussian distributions 
A^(x : Aj) in p(x) be the same, say Aj = A for i = 0, ■ ■ ■ , 2g. Moreover, 
we further express A in terms of A = 828^^, where 82 is a square root matrix 
of A. 

Substituting Eq. fl6.28bp into Eq. fl6.23bD and noting that ^ 6j = 1, 

i=0 



is the reason that we stick to this setting in this chapter. However, if one wishes to let 
q> p, the re-approximation scheme can be adjusted accordingly. The idea is to produce I 
sets of sigma points in the spirit of Eq. (|6.25p . either with the same column vectors Sj, or 
replacing Sj therein by their rotations in the vector space. Each of these sets consists of qo 

sigma points so that qo < P and Ix qo > p, but with a different coefficient Cj in Eq. (|6.25p . 

; 

In this case, a further constraint on the coefficients Ci is that XI < 1- Moreover, the 
weights in Eq. (|6.27p also have to change accordingly. 
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Aj = A, we have 

P = A + c2^s,(s,f . (6.29) 

i=l 

On the other hand, we note that 

p 

P = 5^s,(s,f . (6.30) 

1=1 

Thus in order to satisfy P = P, we require 

q p 
i=l i=q+l 

Therefore, we can choose 

S2 = [dsi, ■ ■ ■ , dSg, Sg+i, • • • , Sp] (6.32) 

as a square root matrix of A, where d = {1 the coefficient com- 

plementary to c. For convenience, hereafter we call d the complementary 
coefficient. The role of d in influencing the GMM can be illustrated through 
the following scenario. Suppose p = q, then, when d — ^ 0, we have A ^ 
and the Gaussian distributions A^(x : Zi, A) (z = 0, ■ ■ ■ , 2g + l) approach the 
delta functions with point masses located at Z^. Thus the GMM in Eq. (16.211) 
will approach a Monte Carlo approximation with Zi being the samples. On 
the other hand, when d ^ 1, we have c 0. Hence all the Gaussian distri- 
butions A^(x : Zi, A) (z = 0, ■ ■ ■ , 2g + 1) approach A^(x : x, P). Therefore, 
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the GSF will approach its base filter, e.g., the EnKF or the SPKF. 

In the previous chapters we have seen that, at each assimilation cycle of 
a reduced rank SPKF, the filter requires an SVD in order to produce sigma 
points. Therefore, for the GSF with a reduced rank SPKF as its base filter 
and equipped with the auxiliary algorithm, by letting the covariances of all 
the Gaussian distributions in the re-approximated GMM be the same, one 
only needs to perform an SVD once (at the filtering step for the SUKF, or at 
the propagation step for the DDFs) for both the purpose of generating sigma 
points for its base filter, and that of conducting a pdf re-approximation. 
Therefore, if the SPKF-based GSF is implemented in parallel, in principle 
it may achieve almost the same computational speed as the reduced rank 
SPKF itself. 

In contrast, if one chooses an EnKF (e.g., the ensemble transform Kalman 
filter) as its base filter and implements the GSF in parallel, then there are 
extra costs in conducting matrix factorization (e.g., SVD) if one equips the 
GSF with the auxiliary algorithm. The computational speed of the EnKF- 
based GSF will become almost the same as that of the SPKF-based GSF. 

Remarks : In a GSF, even if both and are equal to 1 in Eq. (16.191) 
(and so the number of Gaussian distributions does not grow), we still suggest 
an implementation of the auxiliary algorithm in the filter. The reasons are 
twofold. 

Firstly, the GSF may suffer from the outlier problem. For some Gaussian 
distributions, say iV(x : /x,, Ej) in the GMM, the observation y may be too 



220 



far way from the projection Ti.{fii) of the mean /ij onto the observation space. 
Thus the distance between y and H{fii) is so large that it makes the weight 
of the Gaussian distribution A^(x : yUj, Sj) in the GMM neghgible compared 
with other Gaussian distributions (cf. Eq. fl6.16p ). In such circumstances, if 
the tiny weights are continually carried forward to subsequent assimilation 
cycles, the weights of the GMM might "collapse" as in a particle filter [12]: 
the weight of one particular Gaussian distribution in the GMM is very close 
to 1, while the weights of the others are almost zero. In this case, the GSF 
is in effect reduced to a nonlinear Kalman filter and may suffer from some 
numerical problems as very tiny values are involved in computation. In such 
circumstances, the auxiliary algorithm is similar to the re-sampling technique 
in the particle filter, with the attempt to adjust the weights of the Gaussian 
distributions in the GMM by replacing the original Gaussian distributions 
by new ones. 

Secondly, the auxiliary algorithm may also help to decrease the compu- 
tational cost of the GSF with the reduced rank SPKF as its base filter. To 
see this, note that if the SPKF-based GSF is not equipped with the auxiliary 
algorithm, the covariances of all Gaussian distributions may not be the same. 
Therefore to produce sigma points for all the reduced rank SPKFs, one may 
have to perform an SVD for each different covariance in the reduced rank 
SPKFs. In contrast, if the SPKF-based GSF is equipped with the auxiliary 
algorithm, one only needs to conduct one SVD to generate sigma points for 
all the reduced rank SPKFs, since through a pdf re-approximation, one can 
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choose to let the covariances of all the Gaussian distributions in the new 
GMM be the same. 

6.5 Example: Assimilating the 40-dimensional 
Lorenz-Emanuel 98 system 

6.5.1 The testbed and the measures of filter perfor- 
mance 

The testbed and the measure of filter performance are also the same as those 
in § I4.7.1I The dynamical system (LE 98) is governed by 

= {xi+i - Xi-2)xi-i - Xi + 8, i = 1, • • ■ , 40 , (1331!) 

while the observation system is 

Ya; = Xfe + Vfc , (13.451) 

where follows the Gaussian distribution N{yk : 0,1). Note that there is 
no dynamical noise (except for some discretization errors) in the dynamical 
system Eq. fl3.44p . but for convenience in using the formulae established in 
§ 16.31 technically we can model the dynamical noise at instant A; by a 
Gaussian distribution N{\ik : 0, 0) with zero mean and zero covariance. 
We integrate the dynamical system Eq. f l3.44p through a fourth-order 
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Runge-Kutta method [Hll Ch. 16], and choose the integration time window 
to be from to 50 dimensionless time units, with the integration step being 
0.05. This setting is denoted by : 0.05 : 50. Similar notations will also be 
used later. We make the observations at every integration step. 

The measure of filter performance is the time averaged relative rmse in- 
troduced in § 13.4.21 which is given by 



where kmax is the number of assimilation cycles (here kmax = 1001), x^*" is the 
truth at the k-th assimilation cycle, and is the estimation of x^^ obtained 
by a GSF. 

6.5.2 Numerical results 

For illustration, we implement three GSFs with different nonlinear Kalman 
filters, namely, the reduced rank scaled unscented Kalman filter (SUKF), the 
reduced rank first order divided difference (DDI) filter (as the representative 
of the divided difference filters), and the ensemble transform Kalman filter 
(ETKF) with the centering matrix given by Eq. fl3.34p (as the representative 
of the ensemble Kalman filters) . 

For the GSFs with the SPKFs as their base filters, to reduce the com- 
putational cost, at each assimilation cycle we conduct pdf re-approximations 
at the step where the SPKFs produce sigma points. With this choice, we 
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perform an SVD only once for both the purposes of pdf re-approximation 
and the generation of sigma points. Specifically, for the GSF with the SUKF 
as its base filter, we conduct a pdf re- approximation at the filtering step, 
thus it is the posterior pdf, in terms of a GMM, that is re-approximated. 
However, for the GSF with the DDI filter as its base filter, we conduct a 
pdf re-approximation at the propagation step, thus it is the prior pdf that is 
re-approximated. For the GSF with the ETKF as its base filter, conducting 
a pdf re-approximation at either the propagation step or the filtering step 
has the same computational cost. In our experiments we choose to conduct 
a re-approximation at the propagation step. 

Note that in our experiments, both the dynamical and observation noise 
are characterized by a single Gaussian distribution. Therefore the number of 
Gaussian distributions in a GSF does not grow with time. However, for the 
reasons given in § 16.41 we still choose to perform pdf re-approximations in all 
subsequent experiments. In those circumstances, a re-approximated GMM 
will have the same number of Gaussian distributions as does the original 
GMM. 

Concretely, the parameters with respect to the GSFs are set as follows. 
We let the scale parameter 77 = 1/2 in Eq. jf|6.27l) . so that all Gaussian 



distributions in a GMM are equally weighteqf 



We let the number q 



: 1 : 5 in § 16. 4[ which implies that the number m = 2g + 1 of Gaussian 



^This implies that we do not prefer any particular Gaussian distribution in the GMM. 
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distributional takes values from the set of odd integers 1 : 2 : 11. Finally, we 
let the complementary coefficient d take values from the set 0.05 : 0.1 : 0.95. 

6.5.2.1 The Gaussian sum filter with the reduced rank scaled un- 
scented Kalman filter as the base filter 

For convenience, hereafter we call a GSF with the reduced rank SUKF as 
its base filter the "SUKF-based GSF" (similar terminologies has been used 
previously and will be adopted for the GSFs with other base filters). Since 
in the previous chapters, we have already studied the effects of the intrinsic 
parameters on the performances of the base filters, we do not vary these 
parameters in subsequent experiments. 

The intrinsic parameters of the reduced rank SUKF are set as follows. 
The length scale of covariance filtering = 240, the covariance infiation 
factor 6 = 7 (cf. § 13.3.31 for the meanings of these two parameters), the 
parameters a = 1, (3 = 2 and A = —2 (cf. § 14.51) . the lower bound li = 10, 
the upper bound = 10, and the threshold at the first assimilation cycle 
Fi = 1000 (cf. § 14.61 or § I4.7.3p . The ensemble size of the background at the 
first assimilation cycle is 10. 

In Fig. 16.11 we show the relative rms errors of the SUKF-based GSFs 
(with different numbers of Gaussian distributions) as functions of the com- 
plementary coefficient d. As one can see, when the number m of Gaussian 
distributions is relatively small, say m = 1,3,5, the relative rmse does not 



A re-approximated GMM and the original one have the same number m each time. 
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Figure 6.1: The relative rms errors of the SUKF-based GSFs (with differ- 
ent number of Gaussian distributions) as functions of the complementary 
coefficient d. 

change significantly as d increases from 0.05 to 0.95. In particular, when 
m = 1, the SUKF-based GSF is equivalent to the SUKF itself, and d does 
not affect the relative rmse at all, since pdf re- approximations do not take 
effect in this case. In contrast, when m is relatively large, say m = 7,9, 11, 
the relative rmse exhibits a different behaviour as d increases. The relative 
rmse with a relatively small value for d, say d = 0.05, is much larger than 
that with a relatively large value for d, say d = 0.95. Note also that when 
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d = 0.95 (close to 1), the relative rms errors of all the GSFs are close to that 
of the reduced rank SUKF itself. The reason for this was given in § 16. 4[ 

The under-performance of the GSF with a relatively large m but small 
d might have a connection with the slow convergence rate of a Monte Carlo 
approximation. When d is small, the GMM approaches a Monte Carlo ap- 
proximation, with a convergence rate possible in the order of (9(m~^/^). Since 
the relatively large values for m (say m = 11) used in our experiments are 
typically very small for the purpose of convergence, it leads to relatively 
large estimation errors. On the other hand, the fact that when d is small 
(say d = 0.05), the SUKF-based GSF with a small m (say m = 3) per- 
forms better than that with a relatively large m (say m = 11), is less well 
understood. A possible explanation might be that, when m is small, the 
GSF is close to the reduced rank SUKF, which implicitly assumes that the 
system states follow a Gaussian distribution. Although the Gaussianity as- 
sumption might not be realistic, it still works better than the Monte Carlo 
approximation with a small number of samples. 

Fig. 16.11 indicates that, with other conditions being the same, a larger 
number m of Gaussian distributions does not necessarily guarantee a better 
performance. For example, when d = 0.15, the relative rmse of the GSF with 
m = 3 is much lower than that of the GSF with m = 11. For this reason, 
in order to compare the performances of the GSFs with different numbers 
of Gaussian distributions, we need to adopt a new measure. Since in the 
context of our experiments, the relative rmse Cr in Eq. (13.461) is a function of 
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Number m of Gaussian distributions 



Figure 6.2: The minimum relative rmse e™" of the SUKF-based GSF as a 
function of the number m of Gaussian distributions. 

m and d, we choose the minimum conditional relative rmse e^*"(m) as the 
new measure, which reads 

e™"(m) = argmine^(m, d) . (6.33) 

d 

In Eq. (16.331) . e5T**"(m) means the minimum value of er{m, d) within the range 
of d tested for a given m. For convenience, hereafter we call the minimum 
conditional relative rmse "the minimum relative rmse" for short when it 
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causes no confusion. 

In Fig. we plot e™" of the SUKF-based GSF as a function of the 
number m of Gaussian distributions. As one can see, e™" decreases mono- 
tonically as m increases. Thus a larger number of Gaussian distributions can 
benefit the performance of the SUKF-based GSF in the sense that it can 
achieve a lower minimum relative rmse. 

6.5.2.2 The Gaussian sum filter with the reduced rank first order 
divided difference filter as the base filter 

For convenience, hereafter we call the GSF with the reduced rank DDI fil- 
ter as its base filter the "DDI-based GSF". The intrinsic parameters of the 
reduced rank DDI filter are set as follows. The length scale of covariance 
filtering 1^ = 240, the covariance infiation factor 6 = 7, the interval length 
of interpolation h = 3 (cf. § 15. 3p . the lower bound li = 10, the upper bound 
lu = 10, and the threshold at the first assimilation cycle Fi = 1000. The 
ensemble size of the background at the first assimilation cycle is 10. 

In Fig. 16.31 we plot the relative rms errors of the DDI-based GSFs (with 
different numbers of Gaussian distributions) as functions of d. The DDI- 
based GSF exhibits similar behaviour to the SUKF-based GSF in terms of 
the relative rmse. Indeed, when the number m of Gaussian distributions 
is relatively small, say m = 1, 3, 5, the relative rms errors do not change 
significantly as d increases from 0.05 to 0.95. Again, when m = 1, the DDI- 
based GSF is equivalent to the DDI filter itself, thus the value of the relative 
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Figure 6.3: The relative rms errors of the DDI-based GSFs (with differ- 
ent number of Gaussian distributions) as functions of the complementary 
coefficient d of pdf re-approximation. 

rmse is independent of d. For relatively large m, say m = 9, 11, the relative 
rmse is relatively large when d = 0.05, but decreases rapidly as d increases. 
When d = 0.95, the relative rms errors of all the GSFs again approach that 
of the DDI fiher. 

In Fig. 16.41 we plot the minimum relative rmse e™" of the DDI-based 
GSF as a function of the number m of Gaussian distributions. As one can 
see, e™" also decreases monotonically as m increases. Thus in this sense. 
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Figure 6.4: The minimum relative rmse of the DDI-based GSF as a function 
of the number m of Gaussian distributions. 

a larger number of Gaussian distributions benefits the performance of the 
DDI-based GSF in the context of our experiment settings. 

6.5.2.3 The Gaussian sum filter with the ensemble transform Kalman 
filter as the base filter 

Similarly, we call the GSF with the ETKF as its base filter the "ETKF-based 
GSF" . The intrinsic parameters of the ETKF are set as follows. The length 
scale for covariance filtering 1^ = 50, the covariance inflation factor 6 = 5. 
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Figure 6.5: The relative rms errors of the ETKF-based GSFs (with differ- 
ent number of Gaussian distributions) as functions of the complementary 
coefficient d. 

The ensemble size of the background at the first assimilation cycle is 10. 

In Fig. 16.51 we plot the relative rms errors of the ETKF-based GSFs as 
functions of d. As is evident, the ETKF-based GSF also exhibits similar 
behaviour to the SUKF and DDI based GSFs, in terms of the relative rmse. 
Indeed, when the number m of Gaussian distributions is relatively small, say 
m = 1, 3, 5, the relative rms errors do not change significantly as d increases 
from 0.05 to 0.95. For relatively large m, say m = 7,9, 11, the relative rmse is 
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relatively large when d is small, say d = 0.05. The relative rmse also drops as 
d increases, but not as rapidly as the SUKF and DDI based GSFs. When d = 
0.95, the relative rms errors of the GSFs with m > 1 also appear to converge, 
but not to the relative rmse of its base filter, as in the SUKF and DDI based 
GSFs. Our explanation for this difference is that, when m = 1 we chose 
to implement the ETKF-based GSF in the same way as that in Chapter [21 
where the square root of the background covariance is directly obtained from 
the background ensemble (without conducting any SVD). But when m > 1, 
the square root of the background covariance is obtained through an SVD, 
as is required for the purpose of pdf re- approximation. 

In Fig. 16.61 we show the minimum relative rmse e™*" of the ETKF-based 
GSF as a function of the number m of Gaussian distributions. Unlike the 
cases of the SUKF and DDI based GSFs, the minimum relative rmse of 
the ETKF-based GSF decreases as m increases from 1 to 3. After that, 
if one further increases m, one will instead obtain larger values for ej!*™. 
Nevertheless, all these values in the case m > 1 are lower than that in the 
case 171 = 1. Thus the ETKF-based GSF with m > 1 also performs better 
than its base filter, the ETKF. However, a larger number m of Gaussian 
distributions does not guarantee a lower relative rmse. Instead, m = 3 is the 
best choice in the context of our experiment settings. This phenomenon is 
not understood yet. 
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Number m of Gaussian distributions 



Figure 6.6: The minimum relative rmse of the ETKF-based GSF as a func- 
tion of the number m of Gaussian distributions. 

6.6 Summary of the chapter 

In this chapter, we introduced the idea of conducting pdf approximations 
through Gaussian sum approximations, also known as Gaussian mixture 
models, to approximate the integrals in recursive Bayesian estimation (RBE). 
Applying this idea leads to a "new" algorithm (relative to the filters discussed 
in the precious chapters), called the Gaussian sum filter, that can be used to 
solve the data assimilation problem in nonlinear /non- Gaussian systems ap- 
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proximately. As a feature, a Gaussian sum filter consists of a set of nonlinear 
Kalman filters, which in principle can be any one that was introduced in the 
previous chapters. 

A potential problem of the Gaussian sum filter is that, the number of 
distributions in a mixture model might increase very rapidly with time. This 
can substantially increase the computational cost. To overcome this prob- 
lem, we suggested conducting pdf re- approximations by using some Gaussian 
mixture models to approximate others. To do this, wc proposed an auxiliary 
algorithm such that the re- approximated Gaussian mixture model preserves 
the mean and covariance of the original one. 

For a SPKF-based Gaussian sum filter, if it is implemented in parallel, 
then in principle its computational speed can be almost the same as that of 
the reduced rank SPKF. Note that, even if the number of Gaussian distri- 
butions in a Gaussian sum filter does not grow with time, we still suggest 
conducting pdf re-approximations, as it can avoid some numerical problems. 

For illustration, we used the 40-dimensional Lorenz-Emanuel 98 system 
as the testbed and examined the performances of three Gaussian sum filters, 
with the reduced rank SUKF, the reduced rank DDI filter, and the ETKF 
as their base filters, respectively. Numerical experiments showed that all 
three Gaussian sum filters outperformed their base filters in terms of their 
minimum (conditional) relative rms errors. 
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Chapter 7 



Conclusions and future work 

7.1 Concluding summary 

Since the pioneering work of Evensen [23j, the ensemble Kalman filter (EnKF) 
has become a popular method in the data assimilation community. Various 
versions of the EnKF have been proposed. For examples, see P, [HI [131 [T5| 
[23l[2l[25l[26l[33l[ni[HIl[89l[9l]. 

One of the objectives of this dissertation is to understand the various 
versions of the EnKF, as well as other recursive filters introduced in the 
previous chapters, based on the uniform framework of recursive Bayesian 
estimation (RBE). From the point of view of RBE, all of the filters discussed 
in the previous chapters can be deemed as different approximation schemes 
adopted to approximate the integrals of RBE in possibly different scenarios. 

Another objective is to introduce a few nonlinear filters, which include a 
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new family of nonlinear Kalman filter, called the sigma point Kalman filter 
(SPKF), and the corresponding Gaussian sum filter (GSF) with the SPKF 
as its base filter, called the sigma point Gaussian sum filter (SPGSF) con- 
sequently. The SPKF encompasses two types of nonlinear Kalman filters: 
the scaled unscented Kalman filter (SUKF) and the divided difference filter 
(DDF). They provide better estimation accuracy than some of the conven- 
tional methods hke the extended Kalman filter (EKF). A further advantage 
of the SPKF is that it does not require evaluations of the derivatives of a 
nonlinear function, which is convenient in practice. 

Our last objective is to increase the computational efficiency of the SPKF 
and the SPGSF in high dimensional systems. For each type of the SPKF, 
we introduced a reduced rank version for it by conducting singular value 
decompositions (SVDs) on the covariance matrices of the system states. In 
this way, the number of sigma points in the SPKF can be much less than the 
dimension of the system under assimilation. Therefore the computational 
cost of the SPKF can be reduced in high dimensional systems. 

For the GSF, one potential problem is that the number of Gaussian dis- 
tributions may increase rapidly with time. To address this problem, we sug- 
gested conducting a pdf re- approximation at each assimilation cycle, such 
that one uses a new Gaussian mixture model (with less Gaussian distribu- 
tions) to approximate the original one. To this end, we proposed an auxiliary 
algorithm based on the idea of the unscented transform. If a SPGSF equipped 
with the auxiliary algorithm is implemented in parallel, then in principle its 
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computational speed can be almost the same as that of the SPKF. 

In Chapter [1] we introduced some basic concepts in data assimilation. We 
discussed the objectives of this dissertation and introduced two frameworks, 
least squares estimation (LSE) and RBE, to derive the data assimilation 
algorithms in this dissertation. 

Chapter [2] was the starting point of discussing various nonlinear filters 
in subsequent chapters. We considered the data assimilation problem in 
linear/Gaussian systems. Applying RBE to solve the problem led to the 
well-known Kalman filter. We also derived the same result from the point of 
view of LSE. Based on it, we obtained a useful variant of the conventional 
Kalman filter, called the Kalman filter with fading memory (KF-FM), which 
can improve the robustness of the filter. Apart from the KF-FM, we also 
introduced the square root filter (SRF) as another variant of the conventional 
Kalman filter in order to increase the numerical accuracy and stability of 
a filter. The nonlinear filters introduced in subsequent chapters were all 
implemented in both the forms of the KF-FM and the SRF. 

Chapters [3] - [5] studied different extensions of the conventional Kalman 
filter to nonlinear/Gaussian systems from the point of view of RBE. Chap- 
ter [3] focused on the ensemble Kalman filter (EnKF). In the literature there 
are two major types of the EnKF: the stochastic EnKF and its determin- 
istic counterpart, the ensemble square root filter (EnSRF). In principle, all 
implementations of the EnKF may be deemed as different approximation 
schemes that are designed to approximate the integrals in RBE numerically. 
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To improve the performance of the EnKF, two auxihary techniques, namely 
covariance inflation and filtering, were frequently adopted. Covariance infla- 
tion not only compensates for systematic underestimation of the error covari- 
ances in the EnKF due to the effect of small ensemble size, but also makes 
the EnKF behave like the KF-FM. On the other hand, covariance filtering 
aims to smooth out spuriously large correlations between distant locations. 
Through some numerical experiments, we examined the performances of the 
stochastic EnKF and the ensemble transform Kalman filer (ETKF), one of 
the EnSRFs. Numerical results showed that the ETKF consistently outper- 
formed the stochastic EnKF. 

In Chapter H] we introduced one type of nonlinear Kalman filter, called 
the scaled unscented Kalman filter (SUKF), which is based on the concept of 
the scaled unscented transform (SUT). One advantage of the SUKF is that it 
does not require linearizing a nonlinear system, and so is convenient in imple- 
mentation. We performed an accuracy analysis of the SUKF through Taylor 
series expansions and compared the accuracy of the unscented Kalman filter 
(UKF), a special case of the SUKF, with that of the EnKF in Appendix O 
We showed that the UKF can achieve better accuracy than the EnKF under 
the assumption that the system under assimilation is nonlinear/Gaussian. 
To reduce the computational cost of the SUKF in high dimensional systems, 
we proposed a reduced rank version of the filter. Using the Lorenz-Emanuel 
98 model, we conducted numerical experiments to examine the effects of the 
intrinsic parameters of the reduced rank SUKF on the performance of the 
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filter. We also compared the performance of the reduced rank SUKF with 
that of the ETKF. Numerical results showed that, given the same amount of 
information, the reduced rank SUKF may consistently outperform the ETKF 
if there is no covariance filtering conducted on both filters. 

In Chapter [5] we presented another type of nonlinear Kalman filter, called 
the divided difference filter (DDF), which is based on Stirling's Interpolation 
Formula. Like the SUKF, the DDFs do not require linearizing a nonlinear 
system under assimilation. Instead, they also generate sigma points for the 
purpose of approximation. For this reason, in the literature the SUKF and 
the DDFs are uniformly called the sigma point Kalman filters (or derivative- 
free filters). We conducted accuracy analyses on the DDFs through Taylor 
series expansions. For data assimilation in high dimensional systems, we also 
proposed reduced rank versions of the DDFs. We examined the effects of the 
intrinsic parameters on the performances of the reduced rank DDFs. We also 
made a comparison between the DDFs, the SUKF and the ETKF. Numerical 
results showed that, when there is no covariance filtering, the order of the 
filter, with performance ranked from best to worst, was the SUKF, the DD2 
filter, the CDF, the ETKF, and the DDI filter. However, when covariance 
filtering is introduced, it may significantly affect their performances and lead 
to counter-intuitive results. For example, we showed that the performance 
of the DDI filter was much better than that of the DD2 filter for certain 
parameter values. 

In Chapter [6] we introduced the Gaussian sum filter (GSF) for data as- 
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similation in nonlinear/non-Gaussian systems. A GSF essentially consists 
of a set of parallel nonlinear Kalman filters. All the nonlinear Kalman fil- 
ters introduced in the previous chapters, e.g., the EnKF, the SUKF and the 
DDEs, can be adopted as the base filters of the GSF. A potential problem 
of the GSF is that, in some situations, the number of Gaussian distributions 
in the GSF may increase very rapidly with time. To tackle this problem, we 
suggested conducting pdf re-approximations. To this end, we proposed an 
auxiliary algorithm based on the concept of the unscented transform. If a 
GSF adopts the SUKF or the DDFs as its base filter and is implemented in 
parallel, then in principle the GSF can achieve almost the same computa- 
tional speed as its base filter, the SPKF. But if an EnKF is chosen as the base 
filter, then there is an extra cost at each assimilation cycle in conducting an 
SVD for the purpose of pdf re-approximation, and the computational speed 
of the EnKF-based GSF becomes roughly the same as that of the SPGSF. 
We also conducted numerical experiments to examine the performances of 
the GSEs with different base filters. Numerical results showed that the GSEs 
consistently outperformed their base filters in terms of the minimum condi- 
tional relative rmse. In this sense, whenever feasible, it would be beneficial 
to implement the GSF rather than use its base filter. 

7.2 Main results of this dissertation 

The main results of this dissertation are the following. 
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Firstly, we studied the SPKF, including the SUKF and the DDFs. For 
data assimilation in high dimensional systems, we proposed the reduced rank 
SPKF to reduce the computational cost. The relevant materials presented in 
this dissertation were partially drawn from the research works [Ml EH] . 

Secondly, we combined the reduced rank SPKF and the GSF to assimi- 
late nonlinear and non-Gaussian systems. To increase the computational effi- 
ciency, we proposed an auxiliary algorithm to conduct pdf re-approximations. 
The relevant materials presented in this dissertation were partially drawn 
from the research works [SH \57\ . 

7.3 Outstanding problems and future works 

7.3.1 Intrinsic parameters of various nonlinear filters 

One important aspect in implementing the nonlinear filters in the previous 
chapters is to specify their parameters, which include the covariance inflation 
factor, the length scale of covariance ffitering, and ffiter-specific parameters, 
such as parameters a, (3 and A in the SUKF. We explained, in a qualitative 
way, the effects of some parameters on the performances of these filters. 
For example, the U-turn behaviour of the relative rmse as a function of the 
covariance inflation factor. However, we are far from being able to fully 
understand these effects in a quantitative way. Thus in practice, we may 
have to resort to intensive searches to find the proper values for those filter 
parameters. 
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7.3.2 Performance analysis 

Here performance analysis of a nonlinear filter refers to convergence or error 
bound analysis of the filter. Convergence analysis aims to analyze if the 
estimated pdf of the system states asymptotically converges to the true pdf. 
If the answer is yes, one may continue to study the convergence rate. If the 
answer is no, then one may instead study if there exists any bound of the 
estimation errors. For examples, see [22l Ch. 2] and [19l [90] . 

The authors in [90] derived an error bound for the full rank UKF under 
some assumptions. Similar arguments may also be applied to analyze the 
filters presented in this dissertation. However, we expect that analyzing 
the filters equipped with the covariance filtering technique would be more 
complicated. For this purpose, understanding how the covariance filtering 
technique affects the behaviour of a filter will be a preliminary step. 

Another interesting topic is the convergence analysis of the GSF. In a 
recent work [Ij, the author reported a proof of weak convergence \^ of the 
GSF with the extended Kalman filter as its base filter. This highlights a 
possible way to prove the weak convergence of the GSFs with other base 
filters. 



^Let X be a random variable, and {x„} be a sequence of random variables. Then we 
say the sequence {x„} converges weakly to x, if the cumulative density function i^x„(??) 
of x„ tends to the cumulative density function F^irj) of x for all rj when n — > cx) [T]. 
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7.3.3 Parameter estimation and smoothing problems 

We may extend the methods presented in this dissertation to parameter 
estimation and smoothing problems. 

A convenient strategy to address a parameter estimation problem is to 
treat the parameters to be estimated as (unobserved) system states, so that 
the problem is recast as a state estimation problem [85]. From this point of 
view, in principle, all the filters presented in this dissertation can be applied 
to (approximately) solve the parameter estimation problem. 

Solving a smoothing problem in general is more complicated as it may re- 
quire a dynamical system to run backwards. Currently, a popular smoothing 
algorithm in the data assimilation community is the four- dimensional vari- 
ational data assimilation (4D-Var) [IHl ch. 5] 1^. A shortcoming of 4D-Var 
is that it only generates an improved initial condition, but without the as- 
sociated error covariance to indicate how good the estimation might be. To 
overcome this problem, one may adopt a Kalman smoother [731 ch. 9], which 
can provide both an estimation of a system state and its associated error co- 
variancqj. Again, the three problems in data assimilation, i.e., nonlinearity, 
non-Gaussianity and high dimensionality, might also arise in a smoothing 
problem. Thus one may apply the methods presented in this dissertation to 



^4D-Var is essentially a smoothing algorithm, although it is customary in numerical 
weather prediction (NWP) to use it for prediction. In doing this, one assumes that an 
improved initial condition obtained by 4D-Var yields an improved forecast [93] . 

•^In fact, it can be shown that in linear /Gaussian systems, 4D-VAR is equivalent to the 
fixed-lag Kalman smoother (but without the calculations of the associated error covari- 
ances) [5D] . 
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tackle these problems. 



7.3.4 Particle filter 

The particle filter (PF) [HlEn] is another type of filter designed for nonlinear/non- 
Gaussian scenarios, which can also be interpreted from the point of view of 
RBE The PF also conducts pdf approximations to approximate the in- 
tegrals in RBE. But unlike the GSF, the PF adopts a Monte Carlo method, 
rather than the Gaussian mixture model (GMM), for the purpose of approx- 
imation. 

With sufficient samples, it can be shown that the PF can approach the 
optimal solution to a nonlinear/non-Gaussian estimation problem [22l [30] . 
The major difficulty preventing the use of the PF in high dimensional sys- 
tems is that the number of samples from the pdf of the system states needs 
to scale exponentially with the dimension of the system under assimilation 
(often known as the curse-of-dimensionality) [121 EH] . Otherwise the weights 
associated with the samples may often collapse: the weights will concen- 
trate onto a single sample so that its weight is very close to one, while the 
other samples only have negligible (almost zero) weights [12]. For example, 
the authors in [75] showed that for a 200-dimensional system, at least 10^^ 
samples are required with the PF approach. Thus it may need substantial 
computational cost in dealing with the large samples, for example, updating 
the weight of each sample at the filtering step. 

As particle filtering is a relatively new field, there shall be improvements 
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to make the PF perform better. For example, one may design new sampling 
algorithms to make the approximation error of the PF decrease more rapidly. 
As a result, one may generate, on average, less samples for the PF equipped 
with new sampling algorithms to achieve the same performance as the PF 
equipped with conventional sampling algorithms. One example in this aspect 
ca. be dated baeU to an ea.,y re.ea..eh pape. PO,. where the author conjee- 
tured that, in certain circumstances, "systematic sampling" □ can generate 
samples that converge faster than those generated by random sampling. In 
a recent work [28j, the authors proved that the above conjecture holds under 
certain conditions. Numerical results in [28] also showed that, although with 
fewer samples, the PF equipped with systematic sampling can achieve better 
accuracy than the PF equipped with some of the random sampling meth- 
ods. Note that, however, even with systematic sampling, the problem of the 
curse-of-dimensionality remains in the PF. Thus further efforts are needed 
to tackle this problem. 



^Systematic sampling involves taking a sample from the available data in a set pattern, 
rather than at random. See, for example, [171 ch. 7] for more details. 
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Appendix A 

An alternative derivation of 
analysis update formula of the 
conventional Kalman filter 



Here we follow [\Al to show that one can also derive analysis update formula 
Eq. fl2.6p at the filtering step of the conventional Kalman filter by minimizing 
the following weighted quadratic cost function 



J(xfc) = i(x,-x^)^(Pt)-^(xfc-xt) + i(yfe-7^,x,)^R-l(y,-7^,x,). (A.l) 



To see this, we solve 

rfJ(Xfc)l 
dXfc 



0. (A.2) 
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This leads to 



(A.3) 



With some algebra, we have 

= x^, + [{Plr' + nlR^.'Hk] nlR^\yk - HuA) . (A.4) 



Comparing Eq. (]A.4p with Eqs. (12.61) and (12.200 . it is clear that we need to 



prove that the Kalman gain in Eq. (12. 6p satisfies 
Kfc = Pk'Hl (TCkPlHl + Rfc) 



(A.5) 



To this end, we need to employ the matrix inversion lemma [TH ch. 1], 
which reads 

(A + BD-^C)"^ = A-^ - A^B (D + CA-^B)"^ CA^^ , (A.6) 



where A, B, C and D are matrices with suitable dimensions. 
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Applying the matrix inversion lemma, we have 



R,' - [UkPlnl + R,)"' (r, [HkPlHiy' 

Rk^ — {Rk ('^fcPfc'^fc ) Rfc + R 

- ^k^ ((^fcPfc^fc) ^ + Rfc^) Rfc^ 
Pk'Hl {KkPlHl + Rfc) ^ . 



The last step is obtained by letting A 
{HkPinl)-' in Eq. (jAl). 



Rfc, B 



C 



(A.7) 



I, and D 
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Appendix B 
Proof of Eq. 



The proof provided here mainly follows the procedures given in ^T, § 2.2]. 
We suppose that is an m-dimensional state vector defined on the m- 
dimensional real space for any k. For notational convenience, we will 
drop the domain of definition of x^ in subsequent deductions. 
For our purpose, given 

p (xfc|Yfc_i) = j iV (xfe : Mk,k-i Xfc-i, Qfc) (xfe_i : x^_i, P^_i) dxk-i , 
we need to show that 

p(x,|Yfc_i) = iV(xfc:xtP^) , (B.l) 
where x^ and P^ are given by Eq. (I2.26p . 
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Note that 



N (xfc : Mk,k-i Xfc-i, Qfc) (B.2a) 
= {27r)~"'^^ (detQfe)~^''^exp |-i (x^ - Mk,k-i^k-iV Qk^ i^k - Mk,k-i^k- 



and 



N {^k-i : (B.3a) 
= (27r)-™/^ (det PU)-'^' exp |-1 (x,_i - ^t_,f (P^.,)"^ (x,_i - 4-,) } 

where det* denotes the determinant of a matrix, and exp (•) means the 
exponential function. 
Thus 



N (xfe : Mk,k-i xjt-i, Qfe) N (xjk_i : x^.^, P^_i) 
= (27r)-"^ (detP^i)"'/' (detQfc)-'/' x 

exp I (xfc - Mk,k-i Xfe-i)'^ Qfe ^ (xfe - Mk,k-i Xfc-i) 

- ^ (x/c-i - x^_i) (PLi) (x,_i - x«_,) } 

= (27r)-™ (detP^i)"'/' (dct Qfc)-'/' x 

exp {xfe_iAfeiiXfe_i - Xfe_ibfe_i - bfe_iXfc_i + Cfe_i} , 



(B.4) 
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where 



K-i = (PLi) + Ml,_,Q,'M,,,.^ , (B.5a) 
bfc-i = {PUy'^U + Ml,_,Q-,'^,, (B.5b) 
c,_i = {^Uf {Pl_,)-'^l_, + ^lQ,'^,. (B.5c) 



After integration, one has 



j exp {x^_iA^l^Xfe_i - x^_ibfc_i - b^_iXfc_i + Cfc_i} rfxfc_i 
= (27r)"^/^ (det A,_i)'^' exp 1 1 (bLiA,_ibfc_i - c,_^) | . 



(B.6) 



Substituting Eq. flR5|) into Eq. flR6D . with some algebra (also cf. [72, § 2.2]), 
it can be shown that 

bLiAfc_ibfc_i - Cfc_i = - (xfc - x^)^ (P^)"' (xfc - x^) , (B.7) 

where 

x.l=Mk,k-i^U, (B.8a) 
= Mk,k-i PLi Ml,_^ + . (B.8b) 
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With the above results, one has 



p (xfcl Yfc_i) = j N{^k- Mk,k-i Xfc-i, Qfe) (xfc_i : x^_^, P^.^) dxk_i 
= (27r)-'"/' (detP^i)-'/' (detQfc)-^/^ (det A,_i)'/' x 
exp|-i(x,-x^)^(P^)-' (x,-xt)| . 

(B.9) 

Now we examine the determinants before the exponential function in 
Eq. (lB.9p . To this end, the following identities, which hold for arbitrary 
matrices U and V with suitable dimensions, will be used: 

det (U~^) = (detU)"^ , 

det (UV) = (detU) (detV) , (B.IO) 
det (I + UV) = det (I + VU) . 
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Thus, 



(det PU) (det Qfc)-^/' (det A.^i)^/' 

= (det P^_0 (det Q,)-^/^ (det ( (?»_,) + Air,,_iQ,-^A1fe,fe-i) ) 
= (detQ,)-'/' (det (l + P^iAlJ,_iQ-^A^fc,,_i))"'/' 
= (det Qfe)-^/^ (det (I + MM-iPLiA^M-iQfe '))"'^' 
= (det (Qfc + ^,,,_iP^_iA<J,_,))"'^' 

= (detpy-^/^ . 



p (xfc|Yfc_i) = / iV (xfc : A<fe,fe_i Xfe_i, Qfc) iV (x^.i : x^_^, P^_ J rfx^-i 



(B.ll) 



Therefore 



(27r)-/^ (detPy-^/^xp 



{-^(x.-4r(py-^(x.-xa} 



(B.12) 
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Appendix C 



Accuracy comparison between 
the ensemble Kalman filter and 
the unscented Kalman filter 

Here we make an accuracy comparison between the ensemble Kalman filter 
(EnKF) and the unscented Kalman filter (UKF) in solving the recast problem 
in Fig. 14.11 The analysis of the UKF follows the works [iQl HI] , while the 
analysis of the EnKF follows our original work [55] . 

Given a random variable x with the known mean x and covariance P^;, 
we are interested in estimating the mean and covariance of the transformed 
random variable rj = J-' (x) . In analysis, we assume that the nonlinear 
function JF (x) can be expanded as a Taylor series, which converges to the 
true value of the variable r] [IQ], HI] . 
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Note that the unscented transform (UT) can be apphed to more general 
situations [iQl HI]. For example, the transformed random variable may be 
described by 77 = ^ (z, u), where z is a Gaussian random variable with mean 
z and covariance P^;, and the dynamical noise u is independent of z, following 
a Gaussian distribution with mean zero and covariance Q. To apply the UT, 
we adopt the joint state x = [z"^, u^]^, so that the transformation becomes 
?7 = JF (x). In this case, the sigma points of the joint state x can be generated 
in the spirit of Eq. (14.41) . i.e.. 



^ = 1,2,--- ,L, 



^P,. ) , i = L + 1, L + 2, ■ ■ ■ , 2L, 

i-L 



(C.l) 



where means the zero vector, 



P. 
Q 



(C.2) 



is the covariance matrix of the joint state x, and (y^P^)^ means the i-th 
column vector of the square root matrix \fP2 . Therefore we can just use 
T] = J-' (x) as the general form in our discussion. 
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The actual mean and covariance of the trans- 
formed variable in terms of Taylor series 

Given a vector x and a Gaussian perturbation (5x with zero mean and co- 
variance Pa:, we first expand the transform t] = jF(x + 5x) in a Taylor series 
around the point x. The mean f/ is then given by (cf. Eq. (14.191) ) 



for a positive integer number i. 

Similarly, the covariance matrix is given by (cf. Eq. fl4.16bp ) 



= E(J^(x + (5x)) 

= -^(x) + ^ ( V^P. V) + ^ E (dL^) + ■ ■ ■ , 



(C.3) 



where the operator D^^ is defined as 



(C.4) 




D,.^(DL^) 



T 



(V^)^P.(V^)+E 



3! 



(C.5) 



+ 



2! X 2! 3! 




) 



+■■■. 
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Accuracies of the EnKF 



Given an ensemble {xj}"^^, the sample mean 17 of the transformed variable 
rj in the EnKF is given by 



1 " 



V 

n 
1=1 



n n X 2! n x 3! 



(C.6) 



where 5xj = Xj — x. 

On the rhs of Eq. flaej) . 



n X 2! 2! \ n 



which is a biased estimation of the second term on the rhs of Eq. fIC.SP 



Moreover, due to the effect of finite ensemble size, the odd-order derivative 
terms like 

n / n \T 



n ^ — ' \ n 

1=1 \ 1=1 



and 

^ n / ^ n \ 

" " " " " " (C.8) 



1=1 \ i=l / 



may not vanish. 
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Similarly, we have the sample covariance P„ given by 



1 " 

-5^(-F(x,)-7}) (-F(x,)-r)r 

n — 1 ^-^ 

i=l 

1 



n — 1 



3! 



2! X 2! 



Er=iDL,-^(D.x.-^)^ 
3! 



n — 1 



n 



2! 



V^P.V 
2! 



+ 



(C.9) 



Note that here 



1 " 

Px = 5xi(5x^ 

n — 1 ^-^ 

1=1 



is an unbiased estimation of P^.. 



Comparing Eq. (\C.9\i with Eq. (IC.Sp . we note that 



there are also some spurious modes, for example, terms like ^ D^x^-^ (-DL,-^) 



1=1 



and E DL,J^ (D5x,J^)^ , arising in Eq. ([ClD; 

i=l L ' -I 



the term 



n — 1 



n 



v^p.vKf 



in Eq. (1C.9P is biased. 



259 



Accuracies of the unscented transform 



For the UT, given a set of sigma points with mean x and covariance 

Pa;, the sample mean is given by 



2L 
1=0 

1 



2L 



(C.IO) 



^(x)+±(v-p.v)^+^$:(DL.^+...) 

^ ' i=l 



where 5xj = — x. Note that in Eq. fIC.lOl) . the first and third order 
derivative terms vanish because of the symmetry in sigma points. 

Comparing Eq. (IC.IOI) with Eq. (1C.3I) . we note that, second and third 
order derivative terms in Eq. (IC.lOl) match those in Eq. (10.30 exactly, while 
the difference starts from fourth order terms. 

Similarly, the sample covariance is given by 



2L 



i=0 



1 /'E?iD.x.-F(DL,^)^ 



2! X 2! 



3! 



2! 



2! 



(C.ll) 



2L r 



Clearly, unlike Eq. (]C.9p . there are no terms like ^ 

i=l 
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and JF (D^x,.^)"^ arising in Eq. fIC.lip because of the symmetry in 

1=1 L ' 'J 

sigma points. Moreover, there is also no bias in the term [(V"^P2,'V) JF] [(V"^P2,-V) JF] . 
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